Word categories: Tagset guidelines

Four degrees of constraint are recognised in the description of word categories by means of morphosyntactic tags:

  1. Obligatory attributes or values  have to be included in any morphosyntactic tagset. The major parts of speech (Noun, Verb, Conjunction, etc.) belong here, as obligatorily specified.
  2. Recommended attributes or values  are widely-recognised grammatical categories which occur in conventional grammatical descriptions (e.g. Gender, Number, Person).
  3. Special extensions  are subdivided to yield two constraints:

    Generic attributes or values  are not usually encoded, but may be included by anyone tagging a corpus for any particular purpose. For example, it may be desirable for some purposes to mark semantic classes such as temporal nouns, manner adverbs, place names, etc. But no specification of these features is made in the guidelines, except for exemplification purposes. They are purely optional.

    Language-specific attributes or values  may be important for a particular language, or maybe for two or three languages at the most, but do not apply to the majority of European languages.

In practice, generic and language-specific features cannot be clearly distinguished.

Type special extensions is an acknowledgement that the guidelines are not closed, but allow modification according to need. The four types above correspond to the four types of constraint applied to word categorisation in the lexicon. In general, this document repeats (in a somewhat different form) much of the material dealing with morphosyntactic categorisation in the lexicon, where further information on particular features of the classification can be obtained.