next up previous contents
Next: Phrase and clause level Up: Syntactically annotated corpora Previous: SUSANNE annotation scheme

Preliminary Recommendations

UPenn Treebank

This section gives a short survey of the second phase of syntactic annotation of the UPenn Treebank Project, as a summary of the publications Marcus et al. (1993) and Marcus et al. (1994).

Annotation in phase I

The first phase of the UPenn Treebank project (from November 1989 to December 1992) produced 4.5 million words of text, all tagged for part of speech and 2/3 skeletally bracketted. The text was automatically tagged and parsed (using the Fidditch partial parser), and then corrected by hand. The syntactic analysis used a modified form of the Lancaster Treebank approach, using a context-free structure.

The main goals of the second phase, which began in 1993, were the following:

Mark predicate-argument structure
-- Most importantly, the notation should allow for automatic extraction of predicate argument structure.

Provide a richer syntactic annotation
-- A broader coverage of syntactic phenomena described is aimed at. (See table of old and new syntactic labels.

Describe non-continuous structures/dependencies
-- This was formerly made difficult due to the restrictions of a context-free mechanism.

Mark grammatical functions and semantic relations
-- How are subconstituents semantically related to their predicates? At a minimal level, logical subjects and objects are to be marked.

Provide a standard objective methodology for parsers
-- Comparison of parsers that rely on deeper (i.e. closer to predicate argument structure) syntactic representation with surface parsers can be facilitated by marking of predicate argument structure.

Provide a better consistency of analyses
-- Correction of some previous inconsistencies in syntactic tagging.

Annotation in phase II

The following issues were to be addressed in phase II of the project:

Consistency

Example: A predicate is always either

(S (NP-SBJ I)

   (VP consider

      (S (NP-SBJ Kris)

         (NP-PRD a fool))))

Null elements

Some examples are shown below of the use of null elements:

Discontinous elements

The context free mechanism of Phase I led to the trapping problem when a sentential level adverb is followed by verb complements.

With context free notation, one can do one of the following, with their respective consequences:

In UPenn II, discontinuous elements are called pseudoattached:

Simple extraposition: *ICH*

(S (NP-SBJ Chris)

   (VP knew

       (SBAR *ICH*-1)

       (NP-TMP yesterday)

       (SBAR-1 that

             (S (NP-SBJ Terry)

                (VP would

                   (VP catch

                      (NP the ball)))))))

True ambiguity: *PPA*

Even given context, the ambiguity cannot be resolved for human annotators.

(S (NP-SBJ I

       (VP saw

           (NP (NP the man)

               (PP *PPA*-1)

           (PP-CLR-1 with

                     (NP the 

                         telescope))))))

Right node raising: *RNR*

The same constituent appears to have been shifted out of both conjuncts.

(S But

   (NP-SBJ-2 our outlook)

   (VP  (VP has

            (VP been

                (ADJP *RNR*-1)))

         ,

         and

         (VP continues

             (S (NP-SBJ *-2)

                (VP to

                    (VP be

                        (ADJP *RNR*-1)))))

         ,

         (ADJP-1 defensive)))

Extraposed sentences with `it'

(S (NP-SBJ (NP It)

           (S *EXP*-1))

   (VP is

       (NP a pleasure))

   (S-1 (NP-SBJ *)

        (VP to

            (VP teach

                (NP her)))))



pleasure(teach(*someone*, her))

Semantic roles/grammatical functions

As it is very difficult to determine a set of underlying semantic roles, the UPenn II Project restricted itself to the clearly distinguishable semantic roles listed below. The given list mostly relates to adjuncts.

Semantic roles

Grammatical Functions

The special class `closely related' is used for constructions occupying middle ground between arguments and adjuncts. These constructions correspond to Quirk et al.'s class of predication adjuncts, some phrasal verbs (Quirk et al., 1985). Usage:

Examples are given below for the tags -DTV, -PRD, -TPC, -CLF and -PRP:

-DTV

(S (NP-SBJ Aristotle)

   (VP gave

       (NP the book)   

       (PP-DTV to

               (NP Plato))))







(S (NP-SBJ Aristotle)

   (VP gave

       (NP Plato)

       (NP the book)))

-PRD

Non-VP predicates

(SQ Was

    (NP-SBJ he)

    (ADVP-TMP ever)

    (ADJP-PRD successful)

    ?)





(SINV and

     (ADVP-PRD-TPC-1 so)

     (VP did

         (ADVP-PRD *T*-1))

     (NP-SBJ the hippopotamuses))

-TPC

(S (PP-TPC-12 Of

        (NP (NP the 500 barbers)

                (PP-LOC in

                    (NP Philadelphia))))

   ,

   (NP-SBJ (NP (QP only 10))

           (PP *T*-12))

   (VP know

       (SBAR (WHNP-13 what)

             (S (NP-SBJ they)

                (VP are

                    (VP doing

                        (NP *T*-13)))))))

-CLF

(S-CLF (PP-TMP In

               (NP the past))

       ,

       (NP-SBJ it)

       (VP has

           (VP been

               (NP-PRD-2 the

                         husband)

                   (SBAR (WHNP-1 who)

                         (S (NP-SBJ *T*-1)

                            (VP has

                               (VP been

                                  ADJP-PRD-3

                                    dominant)))))))

-PRP

(S (NP-SBJ-1 (NP activity)

             (PP-LOC at

                   (NP (NP a number)

                       (PP of

                           (NP brokerage

                               houses)))))

    (VP was

        (VP curtailed

            (NP *-1)

            (PP-PRP as

               (NP (NP a result)

                   (PP of

                     (NP the earthquake))))))

Gapping

(S (S (NP-SBJ-1 Mary)

      (VP likes

          (NP-2 Bach)))

   and

   (S (NP-SBJ=1 Susan)

      ,

      (NP=2 Beethoven)))



like(Mary, Bach)

like(Susan, Beethoven)
(S (S (NP-SBJ I)

      (VP eat

          (NP-1 breakfast

          (PP-TMP-2 in

               (NP the morning))))

   and

   (S (NP=1 lunch)

      (PP-TMP=2 in

            (NP the afternoon)))))

However, there is no recovery of structure outside the single sentence concerned.

Who threw the ball?



(FRAG (NP Chris)

      ,

      (NP-TMP yesterday))



What is Tim eating?



(FRAG (NP-SBJ Mary Ann)

      (VP thinks

          (SBAR 0

                (FRAG (NP chocolate)))))



next up previous contents
Next: Phrase and clause level Up: Syntactically annotated corpora Previous: SUSANNE annotation scheme