next up previous contents
Next: UPenn Treebank Up: Syntactically annotated corpora Previous: TOSCA annotation scheme

Preliminary Recommendations

SUSANNE annotation scheme

The SUSANNE Corpus (Surface And Underlying Structural ANalyses of Natural English) is a 130,000 word corpus marked with grammatical tags, surface and logical grammatical annotation. The corpus consists of 64 of the 500 texts in the Brown corpus, and the annotated version is available from the Oxford Text Archive.

Introduction

The aim of the project was to provide a publicly available standard for grammatical analysis. The scheme is fully explicit and is described in detail in Sampson (1995).

Method of annotation

The corpus was manually annotated by a team of linguists and computer scientists.

Syntactic tags

The syntactic labelling is carried out on three levels:

  1. Surface grammar;
  2. Syntactic function tags;
  3. logical (or deep) grammar.

Surface grammar

The surface grammar identifies constituents in a clause and assigns labels to these constituents. The scheme used at this level is very similar to that of the Lancaster scheme, although more detail is included. This detail consists mainly of further subcategorisation of phrases, for example a noun phrase may be marked as singular or plural, common or proper. Grammatical function is also included at this level -- different tags are used for a noun phrase marked as subject or non-subject.

Syntactic function tags

The function tags used in the SUSANNE annotation are shown in table 3.9.

 

Complements
:s logical subject
:o logical direct object
:i logical indirect object
:u prepositional object
:e predicate complement of subject
:j predicate complement of object
:a agent of passive
:S surface but not logical subject
:O surface but not logical object
:G guest constituent
Adjuncts
:p place
:q direction
:t time
:h manner or degree
:m modality
:c contingency
:r respect
:w comitative
:k benefactive
:b absolute
Others
:n particle of phrasal verb
:x propositional relative clause
:z complement of catenative
Table 3.9: SUSANNE tags 

Logical grammar

Ghost nodes (extra nodes dominating no wording) are added to parsetrees to show the logical position of elements that have been moved or deleted in the surface structure. Function tags are added to node labels to mark the logical structure, and indices are also added to mark the relationship between nodes marked grammatically as counterparts, such as a ghost and the corresponding full surface constituent. Ghost nodes compare with traces, as used by the UPenn Treebank. Other grammatical constructions marked include the following:

Examples

Table 3.10 provides an example from the SUSANNE corpus (Sampson, 1995) which gives an impression of the various aligned information types that can be given. The columns (i.e. fields) contain the following information:

Field 1
-- Text references;
Field 2
-- Part of speech tags;
Field 3
-- The text words;
Field 4
-- Base form (lemmatised forms of Field 3; e.g. said is lemmatised as `say');
Field 5
-- Syntactic annotation (brackets and labels).

 

A01:0010a YB <minbrk> [Oh.Oh]
A01:0010b AT The the [O[S[Nns:s.
A01:0010c NP1s Fulton Fulton [Nns.
A01:0010d NNL1cb County county .Nns]
A01:0010e JJ Grand grand .
A01:0010f NN1c Jury jury .Nns:s]
A01:0010g VVDv said say [Vd.Vd]
A01:0010h NPD1 Friday Friday [Nns:t.Nns:t]
A01:0010i AT1 an an [Fn:o[Ns:s.
A01:0010j NN1n investigation investigation .
A01:0020a IO of of [Po.
A01:0020b NP1t Atlanta Atlanta [Ns[G[Nns.Nns]
A01:0020c GG +<apos>s - .G]
A01:0020d JJ recent recent .
A01:0020e JJ primary primary .
A01:0020f NN1n election election .Ns]Po]Ns:s]
A01:0020g VVDv produced produce [Vd.Vd]
A01:0020h YIL <ldquo> - .
A01:0020i ATn +no no [Ns:o.
A01:0020j NN1u evidence evidence .
A01:0020k YIR +<rdquo> - .
A01:0020m CST that that [Fn.
A01:0030a DDy any any [Np:s.
A01:0030b NN2 irregularities irregularity .Np:s]
A01:0030c VVDv took take [Vd.Vd]
A01:0030d NNL1c place place [Ns:o.Ns:o]Fn]
     Ns:o]Fn:o]S]
A01:0030e YF +. - .O]
Table 3.10: Aligned information from SUSANNE 

The following example illustrates the use of a ghost node:

     [Nns:s123 John ] wanted [Ti:o s123 to go ]

In this example, `:s' is the Subject function tag, `:o' the Object tag. `Ti' stands for `infinitival clause'. The `:o' tag on `Ti' indicates that the infinitival clause to go is the Object of wanted. The `s123' ghost node indicates the logical position of the surface Subject John. The number `123' is an index to establish the relation between the ghost and its surface realisation.



next up previous contents
Next: UPenn Treebank Up: Syntactically annotated corpora Previous: TOSCA annotation scheme