Common standards for specifying annotation schemes and their application to texts

Next: Conclusion: Manageable levels of Up: Reasonable goals for standardisation Previous: Common standards in describing

Recommendations

Common standards for specifying annotation schemes and their application to texts

The final area for standardisation that we consider, here, appears to be the most difficult to achieve, if it is to be equated with laying down rules for consistency in the application of tags to texts. To take the apparently favourable area of morphosyntactic annotation: the ideal need is to specify an annotation scheme so precisely that a different annotator, applying the same annotation scheme to the same text, would arrive at exactly the same result. This would mean that each word-token, in a given text, would end up with the same tag, even if done independently by two analysts. But in practice, there are always ``fuzzy boundaries'' between categories, such as the uncertainty (in English) of whether to regard gold in a gold watch as an adjective or a noun. Decisions on such matters have to be specified in the annotation scheme, which should also deal with such general issues as whether functional or formal definitions of the use of tags are to be adopted; or whether both function and form have to be represented in the annotation. Individual words may need to be discussed, where their recognition as members of this or that category is problematic. But new phenomena, not covered by existing guidelines, are always liable to occur, however detailed the annotation scheme.

Such issues as these cannot be decided in the abstract, in a way which generalises across languages and across annotation tasks. This kind of standardisation is best met, not by laying down detailed specifications of how this or that category is applied in the tagging of this or that word, but by recommending that a sufficiently detailed annotation scheme be made available to users of the annotated corpus. There is little possibility of seeking detailed agreement between different annotators on matters of how to apply tags to texts, particularly if different languages are involved. But at least, one can ensure that the user be provided with information, as detailed as possible, about how annotations have been applied to texts.

Next: Conclusion: Manageable levels of Up: Reasonable goals for standardisation Previous: Common standards in describing