Definitions

Next: Problems and issues Up: Introduction Previous: Introduction

Recommendations

Definitions

Syntactic annotation is the practice of adding syntactic information to a corpus by incorporating into the text indicators of syntactic structure: e.g. labelled bracketing, or symbols indicating dependency relations between words. Syntactic annotation is here considered to be separate from morphosyntactic annotation (part-of-speech tagging), which indicates the grammatical class of each word token in the corpus (see EAGLES (1996a) for provisional recommendations on the morphosyntactic annotation of corpora). However, the arguments in favour of this separation are chiefly arguments of convenience and practicality, based on the fact that up to the present, when corpora have been annotated, grammatical tagging has been the first and easiest level of annotation to be added.

It is possible to define syntactic annotation to subsume morphosyntactic annotation. For the sake of consistency, we avoid this definition here. However, as mentioned above, this distinction is an artificial one, and morphosyntactic annotation will be referred to within the study of syntactic annotation, as shown later in the classification of existing schemes. In many cases, morphosyntactic annotation is a prerequisite for syntactic annotation, not an integral part of the parsing process. Syntactic annotation is also considered as separate from semantic annotation, although the boundary between these two is not always clear (See on logical or deep structure annotation.)

The term `annotation scheme' is used to refer to a specification of a set of annotation practices employed for a particular corpus. For a syntactic annotation scheme, we may also use the term `parsing scheme' .

We will be concerned only with annotation schemes, not the methods used to apply schemes to a corpus. Annotation schemes may be applied in a number of different ways: manually (as in SUSANNE), automatically (as in the English Constraint Grammar (ENGCG)) or interactively (as in TOSCA). A last possibility is to parse a corpus automatically, manually correcting any errors afterwards (as in the Penn Treebank).

Next: Problems and issues Up: Introduction Previous: Introduction