All tests for tagger evaluation (cf. section 5.2) and for tagset evaluation (cf. section 5.3) are based on the same training, test corpora and word lists provided by IMS. Two word lists are used to build the training and testing lexicons:

  1. The regular lexicon contains only those tags which result from a morphological analyser and mapping rules.
  2. The augmented lexicon contains all entries of the regular lexicon, but also all word forms from the training corpus with the corresponding manual tags.

For each test setup of tagset experiments (cf. section 5.3), we modified the training and testing corpora with respect to the tags to be evaluated and adapted the associated lexicons as well.

The tests on text type evaluation (see section 6.3) were run on the RXRC corpus collection, which covers different types of texts, and uses a tagset which is very close to STTS.