next previous contents
Next: Recommendations for morphosyntactic categories Up: Morphosyntactic Annotation Previous: Rationale for the present

Recommendations

Harmonisation with proposals of the Lexicon Working Group

Like the lexicon guidelines, the morphosyntactic tagging guidelines

  1. Make use of an attribute-value formalism.
  2. Do not adhere to a strict attribute-value hierarchy (in terms of monotonic inheritance).
  3. Use three levels of constraint (obligatory, recommended and optional) in defining what is acceptable according to the guidelines.
  4. Subdivide the optional level into two types of optional extension to tagsets:

    1. Extensions to deal with phenomena which are marginal to morphosyntactic annotation strictly defined, but common to a number of languages (e.g. the distinction between countable and mass nouns);
    2. Extensions to deal with phenomena which are specific to particular EU languages.

A few words may be added regarding each of these points:

  1. At a descriptive level, morphosyntactic tags are therefore defined as sets of attribute-value pairs, although at a `visible' character-coding level they may not be symbolised as such.
  2. For an individual language, it may be an important step to formalise the tagset as an attribute-value hierarchy. However, this degree of formalisation is not appropriate to the cross-linguistic level of abstraction, where we are specifying guidelines to apply to all EU languages.
  3. The obligatory level of constraint is limited to the major categorisations of parts of speech as Noun, Verb, Conjunction, etc. The recommended level of constraint applies to well-known attributes used widely in the description of European languages: e.g. (for nouns) Number, Gender and Case.
  4. At the optional level, the guidelines clearly have a weaker import, and should not be regarded as mandatory in any sense, but simply as a presentation of possibilities sanctioned by current practice.

The tagset guidelines should allow mappings to be stated between the coding of morphosyntactic phenomena in a lexicon and their coding in the morphosyntactic annotation of text corpora. However, because of the different perspective and goals of these two activities there is no necessary expectation that this will be a straightforward mapping. One suggestion, therefore, is that it should be easier to specify the conversion between lexicon and annotation categories by making use of an Intermediate Tagset.


next up previous contents
Next: Recommendations for morphosyntactic categories Up: Morphosyntactic Annotation Previous: Rationale for the present