Syntactic annotation of corpora, grammars and parsing

Next: The size of the Up: Problems and issues Previous: Problems and issues

Recommendations

Syntactic annotation of corpora, grammars and parsing

The scope of this report is syntactic annotation of corpora. At first glance, a study of such annotation practices is difficult to distinguish from a study of parsers, parsing, grammars, the representation of parses, and the formalisms adopted for such representations. It is clear, however, that we cannot include all of these things in this report. Computational parsing and implementable formal grammars have been the subject of concentrated research over decades. The literature of the field is immense. On the other hand, the syntactic annotation of corpora is a comparatively new field. Apart from work on English, relatively little syntactic annotation of corpora has been carried out on European languages. Clearly, the syntactic annotation of corpora has a close interrelation with parsing (indeed, a major function of a syntactically annotated corpus is to provide a test-bed or a training-bed for wide-coverage parsers). This cannot be ignored in the report: but what we are ultimately interested in is the parsing schemes in use to date (i.e. the set of symbols used in the annotation scheme and guidelines for their application), although how the corpus is parsed (the parsing system) is relevant, albeit indirectly, to our task. For example, robust parsers used for corpus annotation may be rule-based, probabilistic, or may employ a combination of methods. These alternatives may have repercussions in terms of the consistency or reproducibility of the resulting annotation. Thus the system is relevant because there is a relation between the method of parsing, what is actually parsed, and also the nature of the resulting resource.