next up previous contents
Next: Overview of German tagset Up: Additional suggestions for tagset Previous: Function Words ``wie''``als''

Foreign Material

Problem:
There is more foreign material (FM) and foreign names of entities (more than one item long) in the corpus than expected. Some of this material forms part of the German syntax (names), some don't (film titles or translations). Proverbial sayings from Latin, French or English are inserted and have a function as a whole, but a single function cannot be given to its parts (because we are not describing Latin, French or English parts of speech). Is the length of the inclusion relevant? What does the language model do with chains of FM?

IMS/TUE:
All foreign material had to be sorted into a special class. In analogy zu z.B./ADV: last/ADV but/ADV not/ADV least/ADV

Tagging practice STTS:
Names of cities, persons, institutions are tagged as proper names, where one is sure. Foreign common nouns which are already lexicalized in German (Yoga, Joghurt, Jeans) are to be tagged as NN. All other foreign material to be tagged as FM.
  • Der beliebte Film `` A/FM fish/FM called/FM Wanda/NE ''.
  • per/FM se/FM ist das kein Problem.
  • der berühmte ``dedazo/FM'' (Fingerzeig) funktioniert noch immer.
  • Sie essen heute a/FM la/FM carte/FM
  • er war damals schon lange persona/FM non/FM grata/FM.
  • er ist primus/FM inter/FM pares/FM
  • Der spanische Titel `` Mi/FM querido/FM Tim/NE Mix/NE ''
  • und dann, last/FM but/FM not/FM least/FM, gibt es ein gutes Mittagessen.

Tests:
I
  1. simple granularity test
  2. inclusion in Test on NN/NE.