next up previous contents
Next: Countability Up: Application to Greek Previous: Number

Preliminary Recommendations


Attribute Value Gr. example Gr. tag
Case nom vivlio NoCmNeSgNm
gen vivliou NoCmNeSgGe
acc vivlio NoCmNeSgAc
dat (suneheia) NoCmNeSgDt
l-spec voc Ghianny NoPrMaSgVo
l-spec indcl asanser NoCmNeNvIc

As regards the value dat, we must report that its inclusion in our tagset was forced by the needs of corpus annotation.

The problem arises from the phenomenon of diglossia in Modern Greek, i.e. the parallel existence of types of dimotiki and katharevousa at the same time in the language. Although dimotiki has been the official language of Greece since 1974, many types of katharevousa are actively used, mainly in the written language.

Thus, as regards the dative case, it is used in the ILSP tagset although it does not belong to the current inflectional system of Modern Greek. Only four case values exist at present: nominative, genitive, accusative and vocative. However, there remain in the Modern Greek language certain word forms from katharevousa (mainly fixed phrases), which make use of this case value, thus enforcing us not only to recognise them but also to characterise them. These word forms are mostly used in fixed phrases; in the future, we plan a mechanism that will be able to recognise and tag appropriately groups of words, i.e. two or more words functioning together as one (e.g. complex adverbs, complex prepositions, fixed expressions, etc.):

`en suneheia' (complex adverb, consisting of a prep. and a noun in the dative case).

The value indcl, mutually exclusive with the other values, is used in Greek for words that retain the same form in whichever case they are found, either in the singular or in the plural number. As commented upon earlier, this value mainly applies to foreign words which have been incorporated in the Greek language without having taken the morphological characteristics of the Greek inflectional system. The case value could be disambiguated taking into account the linguistic context in corpora, on the basis of agreement features, although, for reasons of uniformity and in order to avoid further complication of the disambiguation process, at present it is not disambiguated:

``To asanser vrisketai sto isogheio' (nominative)
``Vryka tyn porta tou asanser anoihty' (genitive).

The vocative case (value voc) is rarely used in written texts; it is only to be found in literature, and, in general, in oral or written dialogues. The sentence in which it is found is often followed by an exclamation mark:

``Ghianny, ela edhw!''.

The above two values, indcl and voc are specific to the Greek language, and, therefore, belong to Level 2b.

It is often the case (as regards nouns of the feminine or neuter gender) that the same form is used for both the nominative and the accusative case. Although the use of a new value, nom-acc, would contribute to economy in the lexicon, we have decided to keep them as distinct values and proceed to ambiguity resolution in the process of corpus tagging. Disambiguation can only be made on the basis of the linguistic context, either simply by inspection of the modifying article/adjective, or, if that is not sufficient, by resort to shallow syntactic analysis (subject/object role):

``Y karekla einai sto dhiplano dhwmatio'' (nom.)
``Koitazei tyn karekla'' (acc.)
``To vivlio vrisketai sto trapezi'' (subj. - nom.)
``Vryka to vivlio'' (obj. - acc.)

next up previous contents
Next: Countability Up: Application to Greek Previous: Number