next up previous contents
Next: Other tests Up: Tagset evaluation Previous: Xerox HMM-tagger: Noun test

Xerox HMM-tagger: Verb test

The standard tagset distinguishes modals, auxiliaries and function verbs which is, however, a mere lexical distinction and does not reflect the actual syntactic property in a given sentence. Therefore the verb forms themselves are not ambiguous, but the subclassification should contribute to a higher accuracy for the disambiguation of infinite and finite verb forms.

The following tables show errors concerning the confusion of VAINF/VAFIN, VMINF/VMFIN and VVINF/VVFIN in the standard test. In German, all infinitives (except for ``sein'') are ambiguous with finite verb (1st and 3rd person plural present tense).

training corpus test corpus
V.INF in ambiguity class 3,084 584
tagged as V.INF 263 (45.0 %)
incorrect 42 (16.0 %)
instead of V.FIN 40 95.2 %
V.FIN in ambiguity class 9,087 1,817
tagged as V.FIN 947 (52.1 %)
incorrect 25 (2.6 %)
instead of V.INF 13 52.0 %
V.FIN V.INF in ambiguity class 2,884 547
tagged as V.INF 247 (45.2 %)
incorrect 41 (16.6 %)
instead of V.FIN 40 97.6 %
tagged as V.FIN 119 (21.8 %)
incorrect 13 (10.1 %)
instead of V.INF 13 100 %
tagged V.INF V.FIN 366 (66.9 %)
incorrect 54 (14.8 %)
inverted V.FIN-V.FIN 53 98.2 %

Test: Do not distinguish VA-, VM- and VV- verbforms

Corpus statistics
Tokens 62860 13416
Tags 45 41
Lexicon gaps 1756 283
Lexical errors 542 65
Ambiguity classes 251 189
Ambiguity rate 1.69 1.67

Error statistics
ambiguity tokens in % correct in % LE in % DE in %
1 7978 59.5 7942 99.6 36 0.5 - -
2 2663 19.9 2474 92.9 13 0.5 176 6.6
3 2078 15.5 2013 96.9 8 0.4 57 2.7
4 589 4.4 518 88.0 7 1.2 64 10.9
5 81 0.6 71 87.9 1 1.2 9 11.1
6 19 0.1 16 84.2 - - 3 15.8
7 8 0.1 7 87.5 - - 1 12.5
total 13416 100.0 13041 97.2 65 0.5 310 2.3

Most frequent errors (by word form)
number word correct tag tagger tag
13 DM NN NE
9 Osthold NE ADJD
6 werden VFIN VINF
6 das PDS ART
5 Reich NE NN
4 haben VFIN VINF
4 dem PRELS ART
4 Um KOUI APPR
4 Deutschland NE NN

Most frequent errors (by tags)
number correct tag tagger tag
55 NN NE
46 VFIN VINF
39 NE NN
16 VFIN VPP
14 ADJD VPP
12 KON ADV
11 VINF VFIN
11 ADJD ADV
10 NE ADJD

The results of this test should again be interpreted against the figures displayed in section 6.1.2; again we have concentrated on the Xerox HMM Tagger, to make comparison easy.

In this test, the number of tags annotated both in the training and in the test corpus is reduced with respect to 6.1.2. This is expected, because we have merged the tags ``VAFIN'' and ``VMFIN'' into the class ``VVFIN'', and we have merged the respective infinite and participle form tags analogously. This accounts for a reduction of the number of tags. The corpus ambiguity rates, however, remain the same, of course.

The figures contained in the error statistics are not massively changed. The number of errors is slightly increased, but the changes do not seem to be significant.

The impact of the merging of the verb-subclasses on the overall treatment of verbs in the tagging process seems to be rather small.

training corpus test corpus
VINF in ambiguity class 3,084 584
tagged as VINF 271 (46.4 %)
incorrect 48 (17.7 %)
instead of VFIN 46 95.8 %
VFIN in ambiguity class 9,087 1,817
tagged as VFIN 940 (51.7 %)
incorrect 23 (2.5 %)
instead of VINF 11 47.8 %
VFIN VINF in ambiguity class 2,884 547
tagged as VINF 255 (46.6 %)
incorrect (18.4 %)
instead of VFIN 46 97.9 %
tagged as VFIN 111 (20.3 %)
incorrect (9.9 %)
instead of VINF 11 100 %
tagged VINF VFIN 366 (66.9 %)
incorrect (15.9 %)
inverted VFIN-VFIN 57 98.3 %

Without the distinction VA-, VM- and VV- there are slightly more errors within the confusion class VINF/VFIN (58 versus 54 in the standard test).



next up previous contents
Next: Other tests Up: Tagset evaluation Previous: Xerox HMM-tagger: Noun test