Optional syntactic annotation

An annotation scheme which confined itself to the recommended category labels above would give a sparse and incomplete representation of the syntactic form of sentences in a corpus. For many purposes and for many languages, other major constituent labels, such as Auxiliary, Determiner Phrase, or (in English) Genitive Phrase will be found to be necessary. We will say no more of these here.

Sentence subcategorisation

It is common practice in most existing schemes to further subcategorise sentences. [S] can be used for all sentences, or may be restricted to simple declarative sentences or imperatives, and questions with declarative word order. Other labels may be introduced, e.g. [SI] may be used to mark sentences in which the verb is imperative; [SQ] to mark questions, including such constructions as `yes/no' questions, or `tag questions'. Other descriptors may be required in order to mark sentences of kinds which are included in other sentences, e.g. `direct quotations' or `interpolated sentences'. Whenever the need for such refinements arises, they can be introduced into the annotation scheme. These additions should of course be documented.


Syntactic clause subcategorisation

Clause was introduced as a recommended category above. Clauses may be further subclassified traditionally by syntactic properties (such as whether the verb is finite or non-finite, or the clause verbless), or by functional properties (e.g. whether an embedded clause is nominal, adverbial, relative or comparative; see borderline categories). Further possibilities for subcategorisation may be language-specific. For example, in English it is common practice to categorise clauses according to their introductory subordinator, e.g. that-clauses, or wh-clauses.

At this level of analysis, the most useful distinctions are the purely syntactic ones of finite, non-finite, and verbless. The functional properties mentioned above are more dependant on semantic roles of the clause in question, and will therefore be dealt with later. These syntactic distinctions may be further subdivided: e.g. under non-finite one can distinguish between infinitive clauses, gerundival or participial clauses, and past participial clauses.

Syntactic phrase subcategorisation

Phrases may be further subcategorised to show syntactic features such as gender, person and number. Thus a Noun Phrase may be marked as singular, masculine, etc. However, as morpho-syntactic tagging currently tends to be a preliminary to syntactic annotation, this type of information can often be derived from the POS-tag of the head of the phrase. The grammatical features of the head of the constituent may therefore be percolated up to the highest node of that constituent.

In 54 and 55, the subcategorisation features of the Noun filles in 54 -- 3rd Person Plural Feminine:

(54)  [NP Les filles_N3PlFem NP] [VP ont écrit [NP les lettres NP] VP]
have been transferred from word to phrase level in 55:
(55)  [NP Les filles NP-3PlFem] [VP ont écrit [NP les lettres NP] VP]
In coordinated NPs one or more of these features may differ. Languages vary as to which feature dominates the phrase. In French, for example (see 56 and 57), when one conjunct in a coordinated NP is masculine singular and another feminine singular, the phrase as a whole will be marked as masculine plural:

(56)  [NP[NP Le garçon_N3MascSg NP] et [NP la fille_N3FemSg NP] NP] ...
(57)  [NP[NP Le garçon_N3MascSg NP] et [NP la fille_N3FemSg NP] NP-3MascPl] ...
Syntactic subcategorisation of other phrases may also occur. For example, a Verb Phrase may be marked according to tense, mood, aspect, voice, etc., as well as according to the complementation frame (transitive, copular, intransitive etc) it represents.

Grammatical function

Optionally, syntactic functions can be assigned to constituents. At the rank of sentence or clause, the Lexicon/Syntax SubGroup mark only one grammatical function, namely +/-Subject, but other grammatical functions may be derived from the combination of several other syntactic features plus the property +/- Subjectgif. For annotated corpora, we propose Subject, Object, Indirect Object (if it applies in a language) and Adjunct. If used, these labels may be hyphenated to the constituent label, as in 58:

(58)  [S [NP-Subj John NP-Subj] [VP gave [NP-Obj a book NP-Obj] [PP-IndObj to [NP Mary NP] PP-IndObj] [PP-Adjct on [NP Monday NP] PP-Adjct] VP] . S]
At the phrase level, other functions, such as head and modifier are commonly recognised.

