next previous contents
Next: Multiple tagging practices: Form-function Up: Underspecification and ambiguity in Previous: Dealing with underspecification

Recommendations

Dealing with ambiguity

Ambiguity, as contrasted with underspecification, is the phenomenon of lack of information, where there is uncertainty between two or more alternative descriptions. Four different senses of ambiguity can be distinguished in morphosyntactic tagging.

Grammatical homonymy

The English word round has five potential tags: it can be

  1. A preposition;
  2. An adverb/particle;
  3. An adjective;
  4. A noun; or
  5. A verb.
Normally, this type of ambiguity, if it is considered such, does not occur in an annotated corpus, since the ambiguity is resolved.

Portmanteau tags

However, with large corpora, tagging is done automatically, and there may be no need or opportunity for the manual post-editing of the whole corpus. It can be practical, in such cases, to retain more than one tag in the annotated corpus, where the automatic tagging algorithms have not provided strong enough evidence for disambiguation. For example, in the British National Corpusgif, a set of portmanteau tags is used in recording such ambiguities. One of them is the tag VVD-VVN, which means ``either the past tense or the past participle of a lexical verb''. The portmanteau tag appears in the annotated British National Corpus in the TEIgif format of an entity reference appended to the word, e.g.: likedVVD-VVN ;. Other formats of presentation would also be reasonable. A portmanteau tag signals uncertainty about the appropriate tag, for reasons of fallible automatic processing. It is assumed that a trained human post-editor would in general have no difficulty in resolving the ambiguity.

Human uncertainty ambiguities

A further type of ambiguity may arise where the human annotator cannot decide on a single appropriate tag. There may be good reasons for this type of indecision:

In the present stage of development of morphosyntactic tagging, the ability to deal with this kind of ambiguity is not a matter of great priority -- but it may become more important in the future.

Genuine textual ambiguities

By this we mean cases where the text does not provide enough information for disambiguation between two or more clearly defined categories. For example, it may be unclear whether in a given case the exclamatory word Fire! is a verb or a noun. Ideally, in such cases, more than one tag should be attached to the same textword.

The encoding of ambiguity in morphosyntactic annotation has so far received little attention, and we make no recommendations except to propose that in principle, all the kinds of ambiguity listed above should be distinguishable by different mark-up.


next up previous contents
Next: Multiple tagging practices: Form-function Up: Underspecification and ambiguity in Previous: Dealing with underspecification