next up previous contents
Next: Overview of phenomena and Up: Tagset evaluation Previous: Tagset evaluation

Relevant tagset modifications

 

Below, we describe the types of tagset modifications we will test in the practical part of the tagset evaluation. Some of the changes were motivated by linguistic reasons, others by practical reasons (tagging accuracy). Many of the changes included changes to ambiguity classes for a word form or a group of word forms. A list of phenomena is given below, in section 5.3.2, and more details (including a description of the ``old'' vs. ``new'' practice) can be found in appendix A.

The different kinds of tagset modifications are defined as follows:

Type I: Granularity Changes
Type Ia: Simple granularity changes: Two or more tags for ambiguous word forms are merged, or an ambiguous tag is split into one or more unambiguous tags. As mentioned above, a splitting is only advisable if the word forms have a different distribution. Total tagging accuracy is regarded after the change.
Type Ib: More complex granularity changes, overlapping tags. Two ambiguous tags, A and B, are sometimes split into three classes, where the new, third class has overlapping ambiguities with either of the original classes. Such is the case with personal and reflexive pronouns. The constellations for tests in this case are more numerous:
  1. merge the forms into a single ambiguous tag (AB);
  2. split the forms, such that there is a single tag for the two unambiguous forms, and a tag for all ambiguous forms (A, B, AB);
  3. merge the ambiguous forms with either of the unambiguous tags (2 possibilities: ``A, AB'' and ``AB, B'').

Type Ic: External impact of granularity changes: Granularity changes or definition changes not only affect the immediately involved (internal) ambiguity classes, but may also have an influence on other ambiguous classes. It might thus be worth splitting a non-ambiguous class, if a positive impact of this splitting on the disambiguation of other classes can be expected.

Type II: Changes in the assignment of wordforms to tags. The borderline between two tags is shifted (in guidelines and in the annotation of the reference text) but tags included in these changes are otherwise still the same. An example for German: it is very difficult to distinguish lexicalized participles used adverbially (ADJD) from participles used within the verbal complex (VVPP). We tried several different definitions, depending on the syntactic constructions the ambiguous wordforms appear in.



next up previous contents
Next: Overview of phenomena and Up: Tagset evaluation Previous: Tagset evaluation