next up previous contents
Next: Ambiguity Up: Underspecificationambiguity and ambivalence Previous: Underspecificationambiguity and ambivalence

Recommendations

 

Underspecification

This is a phenomenon which can be characterised as incompleteness of annotation. That is, a piece of information that could have been specified in the annotation of a corpus is simply lacking, for one of a number of possible reasons.

Underspecification, at one level, is a necessary and even benign phenomenon. Not all syntactic details (e.g. of subcategorisation of Noun Phrases or Verb Phrases) can be included in any realistic corpus annotation task. It is normal for a parsing scheme to be selective in specifying the syntactic phenomena to be identified and labelled, and omission of some details is almost inevitable.

At another level, underspecification may show unevenness in the degree of detail achieved, and this may be an unavoidable consequence of limitations of human resources, time, software capabilities, etc. For example, it may be possible to mark pronominal Noun Phrases for number, but impossible to mark full Noun Phrases for number, within an acceptable threshold of accuracy. In this case, number will be marked only for some Noun Phrases. Another case of a similar kind occurs where some of the constituents of a sentence are labelled, and others are left unlabelled. This kind of `information gap' should be documented in the annotation scheme, and inconsistency in its occurrence should if possible be avoided.