next up previous contents
Next: Xerox HMM-tagger: Standard test Up: Tagger evaluation Previous: Tagger evaluation

TreeTagger: Standard test

 

For the tests with the TreeTagger the extended lexicon was applied. Therefore the training corpus does not contain any non-lexicalized word form.

Corpus statistics
Tokens 62860 13416
Tags 51 46
Lexicon gaps 0 240
Lexical errors 0 62
Ambiguity classes
Ambiguity rate 1.58

Error statistics
ambiguity tokens in % correct in % LE in % DE in %
1 8138 60.6 8088 99.4 50 0.6 - -
2 3275 24.4 3084 94.2 7 0.2 184 5.6
3 1533 11.4 1432 93.4 2 0.1 99 6.5
4 424 3.2 376 88.7 1 0.2 47 11.1
5 14 0.1 9 64.3 0 - 5 35.7
6 9 0.1 6 66.7 0 - 3 33.3
7 15 0.1 9 60.0 2 13.3 4 26.7
8 6 0.0 4 66.7 0 - 2 33.3
9 0 - 0 - 0 - 0 -
10 2 0.0 0 - 0 - 2 100.0
total 13416 100.0 13008 97.0 62 0.5 346 2.6

Most frequent errors (by word form)
number word correct tag tagger tag
13 um KOUI APPR
9 werden VAINF VAFIN
9 Osthold NE ADJD
8 Aber KON ADV
8 der PRELS ART
8 das PDS ART
7 dem PRELS ART
6 Stänner NE NN
6 Brück NE NN
5 die PRELS ART

Most frequent errors (by tags)
number correct tag tagger tag
112 NE NN
24 VVINF VVFIN
21 VVFIN VVINF
21 PRELS ART
18 KON ADV
17 KOUI APPR
12 VAINF VAFIN
11 VVFIN VVPP
11 NN NE
11 NE ADJD



next up previous contents
Next: Xerox HMM-tagger: Standard test Up: Tagger evaluation Previous: Tagger evaluation