ISLE

International Standards
for Language Engineering

 


http://www.ilc.pi.cnr.it/EAGLES96/isle/ISLE_Home_Page.htm



 
 
 
 

Annual Report 2001

 

ISLE is the latest in a series of projects under the successful EAGLES initiative (Expert Advisory Group for Language Engineering Standards). It extends hitherto EU-based EAGLES work within the EU-US international research cooperation framework, set up as a result of two years of joint preparatory work towards an international HLT standards oriented initiative.

The overall goals of ISLE are to support HLT and national projects, and HLT industry in general, by developing, disseminating and promoting widely agreed and urgently needed HLT de facto standards and guidelines, for infrastructural language resources, for the tools that exploit them, and for HLT products. The areas currently targeted by ISLE are: multilingual computational lexicons, natural interaction and multimodality, and evaluation of HLT systems.

A feature of EAGLES/ISLE work is the close interaction between industry and academia, users and providers, founders and beneficiaries.
 

Summary of 2001 Activities

ISLE has made substantial progress in its three spheres of interest. In multilingual lexicons, the initial survey phase has been followed by a phase of working intensively towards specification of the Multilingual Isle Lexical Entry (MILE).  This has involved focussing on a complex word-pair, to gain insights into possibilities for word sense representation and cross-language linkages, on extracting and classifying sense indicators, and on developing a prototype tool to manage MILE-based lexicons. In evaluation of HLT systems, user feedback obtained through 3 international workshops has led to a second, refined version of the ISLE evaluation framework. In natural interaction and multimodality (NIMM), major surveys have been completed of resources, annotation schemes and tools, and metadata descriptions and tools. A prototype tool has been developed to annotate NIMM data. XML schemas have been developed to handle ISLE metadata descriptions, and tools to allow editing and browsing of these descriptions, including across distributed resources. As a whole, the project has been active in dissemination and awareness activities on an international scale.

Multilingual Computational Lexicons

Work towards developing MILE has been oriented towards the needs of several key HLT applications: MT, cross-language information retrieval, cross-language information extraction, multilingual language generation, multilingual authoring and speech-to-speech translation. It is based on previous EAGLES work on monolingual lexicon standards.

Following on from a major survey of important bi- and multilingual lexical resources,  ISLE has been working towards recommendations for MILE bilingual dictionary entries , by firstly focussing on development of a prototype entry for a complex Italian-English word pair. This has involved intensive exploitation of available resources such as SIMPLE (based on previous EAGLES recommendations), COMLEX and the British National Corpus .  Much work has also been done on extracting and classifying sense indicators used in bilingual dictionaries: analysis of these indicators will lead to more precise specification of bilingual transfer conditions, an hypothesis that has already been tested in work on the abovementioned complex word pair. At the same time, the work to define the MILE lexical data model has started, and existing frameworks and representation schemes have been surveyed. The MILE will be based on an entity-relation model, whose main architecture has already been designed. The model will also include a first repository of common “lexical templates” for the fast development of multilingual lexical entries. Important experience and feedback was gained through involvement in data preparation for evaluation of 90 word sense disambiguation systems, covering 12 languages, conducted in SENSEVAL2

A prototype tool to manage computational lexicons modelled according to ISLE recommendations has been developed. A public version will be available next year. Another tool is being developed to browse semantic indicators extracted from machine readable bilingual dictionaries, to support the MILE-based lexicographer.

A workshop was held in Pisa, Italy, bringing together ISLE members and also representatives of industry and academia from several countries outside Europe and the USA, notably from Asia. 

Evaluation of HLT Systems

The focus of work on evaluation is on methods and metrics for Machine Translation (MT) (earlier EAGLES work had looked at other application areas). This has involved investigation of the various published evaluations of MT systems that have been carried out since 1979. However, this work is not being pursued in isolation, as MT is being used as a case study, to enable the later development of: a general theory about the methodology for evaluating HLT applications; and a general framework that can accommodate existing evaluation measures for specific HLT applications. A second version of a specific framework for classifying MT evaluations has been elaborated, illustrating how the current state of the ISLE evaluation methodology can be applied.

This has involved the development of not one, but three parallel taxonomies that describe relevant aspects of the nature and use of MT: user purpose, application process and general software characteristics. This year, the framework has been further refined and further populated with individual evaluation measures, and associated criteria for the application of each measure. Where appropriate, concepts, practices and techniques are linked to published evaluations. The work builds on ISO 9126 and ISO 14598.

The results of this work are embodied in a web site that is intended to help 3 types of user: people who need guidance in choosing a MT system; those who want to compare various MT systems; and those who want to design a new MT system or upgrade an existing one. Each part of the framework has its own comments page, which allows the wider community to participate strongly in helping to develop the framework.

An open workshop (documents available) on practical MT evaluation using the initial framework took place in April 2001, in Geneva, which aided development of the second version of the framework. Another 1-day workshop was held in Pittsburgh, USA, involving again practical hands-on experience of MT evaluation. Industry was strongly represented in both workshops. A third major workshop at the MT Summit VIII served to synthesise results of the first two and to arrive at preliminary conclusions that will then feed in to the end-of-project guidelines.
 

Natural Interaction and Multimodality (NIMM)

NIMM is concerned with extending previous EAGLES work to cater for the needs of more than textual and spoken language resources, given the growing importance of natural interaction with information systems and the multimodal nature of such interaction, involving multimedia.

Work in this period has been on finalising surveys of NIMM data resources, annotation schemes and
tools (contribute your own tool description for the survey), and metadata aspects of multimodal language resources, with a view to elaborating guidelines on the representation, annotation and description of such resources in a standard way.

NIMM sites are available in
Europe and the USA .

User requirements for a tool for NIMM data annotation have been drawn up and prototype tool elements have been developed on the basis of these.

Further information specifically on the
Isle Metadata Initiative (IMDI) is available, including XML schemas for session and catalogue metadata descriptions, for an IMDI-OLAC mapping and for description of lexicon metadata; a user-friendly editor for metadata descriptions (including converters from legacy data); and a tool to allow browsing in complex, distributed metadata descriptions and launch of other tools on retrieved resources. A demonstration of this latter tool involving resources from 6 European institutes was prepared for the opening of the European Year of Languages is available.

workshop on linguistic databases, in Philadelphia, USA, included consideration of work on ISLE standards.

User Group, Promotion and Awareness

ISLE, given its standards-oriented profile, is committed to dissemination activities with a view to engendering and enhancing consensus regarding its recommendations and guidelines. This year has seen ISLE also establishing and developing contacts with countries in Asia and the Americas. Steps have already been taken to formalise cooperation with Asian countries. Events that project members took part in included:

1st International Workshop on Multimedia Annotation , Tokyo, Japan
Vilem Mathesius Lecture Series 16 , Prague
Workshop on
XML Markup Technologies for Working with Linguistic Data , Edinburgh, UK
Workshop on
Multimodal Communication and Context in Embodied Agents , Montreal, Canada
International
E-MELD Meeting, Santa Barbara, California, USA

Publications by the project, apart from those associated with project-organised workshops mentioned above, include:

Future Work

In the last phase of the project, work will be concentrated on producing draft guidelines for best practice in the areas being covered by ISLE and also on refining and documenting the tools and exemplary resources that are intended to help users in applying the guidelines.

Further contact and dissemination activities are planned to ensure the widest possible feedback on the guidelines and to increase the level of cooperation with other countries.

Further Information

The EAGLES Secretariat welcomes feedback and enquiries regarding the work of ISLE.

EAGLES Secretariat
CNR - Consiglio Nazionale delle Ricerche
ILC - Istituto di Linguistica Computazionale
Area della Ricerca di Pisa San Cataldo
Via Moruzzi N° 1
56124 Pisa
ITALY
Phone: [+39] 050 315 2873
Fax 1: [+39] 050 315 2834
Fax 2: [+39] 050 315 2839

ISLE reports are placed on the ISLE web server as they receive approval for dissemination (i.e. as they are considered to represent a consensus view). Further details of the project together with links to earlier EAGLES work may also be found at this location.