next previous contents
Next: Intermediate Tagset Up: Word categories: Tagset guidelines Previous: Special extensions - Optional

Recommendations

Special extensions - Optional language-specific attributes/values

Here we deal with aspects of morphosyntactic annotation which are optional, and may be included in the annotation scheme according to need. Many of them go beyond morphosyntax and are of a syntactic or semantic nature. The following are examples of special extensions of the tagset which may be needed for specific languages. As in application or task-specific features, the examples are purely illustrative and there is certainly no claim to completeness. Thus, we do not recommend any of these features. In some cases, they derive from work already done on tagsets and their applications to texts. In other cases, they derive from specialist research, or from comments on an early draft of these guidelines, supplied by specialists in particular languages.

1. Nouns

[Recommended attributes Generic optional attributes]

An additional language-specific attribute is:

(vi) Definiteness: 1. Definite 2. Indefinite 3. Unmarked [Danish]

This is to handle the suffixed definite article in Danish: e.g. haven (`the garden'); havet (`the sea')

Additional values:

(ii) Gender: 4. Common   [Danish, Dutch]
(iv) Case: 6. Vocative 7. Indeclinable [Greek]

The Common gender contrasts with Neuter in a two-gender system.

2. Verbs

[Recommended attributes Generic optional attributes]

An additional attribute:

(xiii) Aux.-function: 1.Primary 2.Modal [English]

The primary (non-modal) auxiliaries are be, have and do.

An additional value to the non-finite category of verbs is arguably needed for English, because of the merger in that language of the gerund and participle functions. The -ing form does service for both and the two traditional categories are not easily distinguishable.

(v) Verb-form / Mood: 9. -Ing form [English]

3. Adjectives

[Recommended attributes Generic optional attributes]

Additional values for Case:

(iv) Case: 5. Vocative 6. Indeclinable [Greek]

4. Pronouns and Determiners

[Recommended attributes Generic optional attributes]

An additional value for Gender and for Case:

(ii) Gender: 4. Common [Danish]
(v) Case: 7. Prepositional [Spanish]

An additional attribute:

(xii) Strength 1. Weak 2. Strong [French, Dutch, Greek]

Weak and Strong distinguish, for example, me from moi in French, and me from mij in Dutch.

5. Articles

[Recommended attributes]

Again, additional values for Article-Type, Gender and Case are:

(i) Article-Type: 3. Partitive   [French]
(ii) Gender: 4. Common   [Danish]
(iv) Case: 5. Vocative 6. Indeclinable [Greek]

6. Adverbs

[Recommended attributes Generic optional attributes]

Additional values for Adverb-Type:

(ii) Adverb-Type: 3. Particle 4. Pronominal [English, German, Dutch]

In some tagging schemes, especially for English, a particle such as out, off or up counts as a subclass of adverb. In other tagging schemes, the particle may be treated under Residual (Explanation) as a special word-class. German and Dutch have pronominal adverbs such as German daran, davon, dazu.

7. Adpositions

[Recommended attributes Generic optional attributes]

Values for Adposition-Type, in addition to 1. Preposition and 2. Fused-preposition:

(i) Type: 3. Postposition 4. Circumposition [German, English]

German entlang is a Postposition, and arguably, the 's which forms the genitive in English is no longer a case marking, but an enclitic postposition, as in the Secretary of State's decision, in a month or so's time. German (auf...) hin is an example which can be analysed as a Circumposition.

8. Conjunctions

[Recommended attributes Generic optional attributes]

An additional attribute, applying to subordinating conjunctions only:

(iii) Subord.-type: 1. With-finite 2. With-infin. 3. Comparative [German]

For example, in German, weil introduces a clause with a finite verb, whereas ohne (zu...) is followed by an infinitive, and als is followed by various kinds of comparative clause (including clauses without finite verbs).

11. Unique/Unassigned

[Explanation Recommended attributes]

The following miscellaneous values may occur:

(i) Unique-type: 1. Infinitive marker [German zu, Danish at, Dutch, English]
  2. Negative particle [English not, n't]
  3. Existential marker [English there, Danish der]
  4. Second negative particle [French pas]
  5. Anticipatory er [Dutch]
  6. Mediopassive voice marker se [Portuguese]
  7. Preverbal particle [Greek]


next up previous contents
Next: Intermediate Tagset Up: Word categories: Tagset guidelines Previous: Special extensions - Optional