Interaction with European projects

Next: The guidelines of the Up: Methodology adopted Previous: The application phase

Preliminary Recommendations

Interaction with European projects

As already mentioned in the introduction, the experience gained by the application of the EAGLES proposal in several European and national projects had a strong influence on the set of recommendations. The specifications, therefore, have been cyclically revised according to the requirements that emerged while working in the framework of these projects.

Below, the experience of MULTEXT and PAROLE are summarised and the reciprocal influence between these two projects and EAGLES is reported on. Mention of the work carried out in the validation phase of the CLWG is also made. It is worth recalling here that much useful feedback has been received and the impact of the present proposal has been further measured from its use in other projects.

The LRE-MULTEXT experience

The main objectives of the MULTEXT project were the definition and the implementation of a set of tools for corpus-based research and applications, and the production of a corpus in a multilingual framework. Tools and resources were developed on the basis of operational standards and in the light of the conventions which are being defined by the major international projects dealing with the issue of standardisation.

One of the MULTEXT tasks dealt with annotation conventions and hence was strongly connected with the work presented in this document, aimed at formulating:

Common specifications and a common notation for the MULTEXT lexicon; and
A tagset for the MULTEXT corpus on which the tools would run.

The MULTEXT partners involved in this task had carefully evaluated the proposal which was defined within EAGLES -- the proposal presented in a preliminary version of the present document -- for each PoS at Level 1 (the level of recommended features). After a global evaluation of the EAGLES proposal, and after taking into account different grammatical traditions and different language requirements, they checked to see if the features suited the description of their respective languages and added any features and/or values needed at the language specific level. All partners carried out the evaluation by translating their existing -- or still under development -- lexicons into the EAGLES features and values, i.e. applying the proposal concretely to their languages and providing examples of all the legal combinations of values for each category. In such a way, constraints on the application of the values emerged (see the application to the French lexicon, included in the present report).

The MULTEXT experience turned out to be an important test-bed for the EAGLES Lexicon proposal as:

A large core of lexicon specifications proved to fit the description of all the six MULTEXT languages (Dutch, German, English, French, Italian and Spanish);
The cycle of testing and concrete application stressed the need for further specifications at the language-specific level.

An essential change affected the class of Pronouns, which in a preceding version of the EAGLES Lexicon document incorporated the Determiners; the previous merging of the two categories (at least at L1, with the possibility of splitting the two categories at a more fine-grained level) seemed, in the first instance, to be the best solution to cope with the requirements of many corpus practices, that keep the two categories undistinguished, and the best attempt to reconcile lexicon specifications and corpus tagsets. This choice, however, was eventually felt to be too corpus-oriented and the MULTEXT partners expressed their opinion in favour of having, at the lexicon level, two different categories for Pronouns and Determiners. Lexical descriptions, it was recognised, should aim, indeed, at a general and fine-grained description of the language which is independent of particular applications, while, given a set of practical reasons -- state-of-the-art tagging techniques and computability (see Bel et al. (1995)) -- broader categories are to be preferred for the tagsets where collapsing decisions are to be made.

The requirements that emerged from the MULTEXT feedback, therefore, are reflected in the the lexicon specifications of the present version of this document.

It is worth also mentioning here that the EAGLES/MULTEXT specifications were extended and adapted to cover the requirements and peculiarities of a number of Eastern European languages, within the framework of the COPERNICUS-MULTEXT-East Project. In particular, the common core of specifications proved to be suitable to the Eastern European languages and the philosophy of the multilayered proposal was seen to be effective for extensions and adaptations (Monachini (ed), 1995).

The MLAP-PAROLE experience

Within the MLAP-PAROLE project, the EAGLES recommendations for morphosyntactic encoding in lexicons were taken as the starting point for finding and establishing the framework in which either to reuse existing lexical data or to start creating new data, according to harmonised criteria, ensuring their reusability.

The definition of the minimal initial technical conditions for starting the activity of actually providing written language resources was achieved by translating the EAGLES recommendations into operational standards (Monachini (ed), 1996). The first action of the PAROLE partners towards the definition of operational standards was the production of language specific instantiations of the EAGLES specifications, having in mind the practical criteria and following the steps listed below.

In the multilayered EAGLES proposal, the most appropriate level of encoding for lexicon resources was recognised, i.e. the set of values was isolated that was
1. Pertinent to each language;
2. Feasible to encode within the constraints of time, money, and lexicon resources already available; and
3. Relevant for a large range of NLP applications.
Once the pertinent set of attribute-value pairs was circumscribed, the logical relationships between the values of the system pertinent to the language under analysis were formalised, in the cases where the values belong to different levels.
Furthermore, the dependencies and the constraints in the application of an attribute or a value in presence of a given attribute or value were also specified.

These last two points are, in our opinion, particularly crucial. Within EAGLES, all the proposed lexicon specifications have been collected together in a large repository, articulated on different levels of obligatoriness. Each language-specific application shows the set pertinent to one language and permits us to deduce the relations between values on the basis of the semantics and the actual exemplifications. This, however, can cause misinterpretation. In the PAROLE task, we went a step further and, for each language, provided a readily interpretable formalised system with the aim of obtaining a structured inventory: the logical relationships between the values were specified in order to help others to consistently interpret their semantics.

In order to highlight the difference with respect to the EAGLES language-specific applications, the concrete translations of the EAGLES specifications into operational standards carried out by the PAROLE partners on their languages, according to the criteria listed above, are termed instantiations.

Next: The guidelines of the Up: Methodology adopted Previous: The application phase