ISLE

International Standards
for Language Engineering

 


http://www.ilc.pi.cnr.it/EAGLES96/isle/ISLE_Home_Page.htm



 
 
 
 

Annual Report 2000


ISLE is the latest in a series of projects under the successful EAGLES initiative (Expert Advisory Group for Language Engineering Standards). It extends hitherto EU-based EAGLES work within the EU-US international research cooperation framework, set up as a result of two years of joint preparatory work towards an international HLT standards oriented initiative.

The overall goals of ISLE are to support HLT and national projects, and HLT industry in general, by developing, disseminating and promoting widely agreed and urgently needed HLT de facto standards and guidelines, for infrastructural language resources, for the tools that exploit them, and for HLT products. The areas currently targetted by ISLE are: multilingual computational lexicons, natural interaction and multimodality, and evaluation of HLT systems.

A feature of EAGLES/ISLE work is the close interaction between industry and academia, users and providers, funders and beneficiaries.
 

Summary of 2000 Activities

Major advances in understanding the nature of the multilingual lexical mapping and representation problem have been made. These were made possible through a wide-ranging survey of bi- and multilingual lexicons (publishers' dictionaries as well as those of HLT systems and projects). As a result, initial agreement has been reached on the basic requirements of a multilingual ISLE lexical entry (MILE), which will underpin the development of a fully-worked out MILE.

Work on evaluation of HLT systems in ISLE focusses on machine translation (MT). The major result this year is the creation of a web site that classifies different aspects of MT evaluations, which can be used by those wishing to use a MT system, to evaluate MT systems or to design/upgrade a MT system.

Work on natural interaction and multimodality led to a significant workshop at LREC 2000, where an international audience helped advance towards consensus on metadata representation and annotation for multimodal/multimedia language resources.

An important development took place at the EU-US CLWG meeting held at the University of Pennsylvania in December 2000: two representatives from Japan (Kyoto University and TIT) and one from Taiwan (Academica Sinica) were invited to participate. The UPenn contingent included also some native Korean researchers who presented work on Korean/English bilingual dictionaries.
 

Multilingual Compututational Lexicons

Work towards developing MILE has been oriented towards the needs of several key HLT applications: MT, cross-language information retrieval, cross-language information extraction, multilingual language generation, multilingual authoring and speech-to-speech translation. It is based on previous EAGLES work on monolingual lexicon standards.

A number of important bi- and multilingual lexical resources was identified. Each resource has been investigated to determine its lexical structure and how it encodes cross-language relationships. This has resulted in a major survey report, documenting results in a format that allows easy comparison of the lexical mechanisms employed by each lexicon considered. The emphasis is on the semantic level of description, as benefits a multilingual perspective, however other levels of lexical information are also taken into account, where these have a bearing on interpretation and on cross-language mapping. This work was undertaken to enable the subsequent identification of a set of basic notions needed to describe the multilingual level, together with other notions that may be recommended for particular purposes or languages.

In order to facilitate the development of MILE, a joint EU-US meeting agreed that the lexical model of PAROLE/SIMPLE, based on previous EAGLES recommendations, would be taken as a starting point, and modified as necessary in the light of the survey results.

A tool to manage computational lexicons modelled according to ISLE recommendations is being developed.
 

Evaluation of HLT Systems

The focus of work on evaluation has been on methods and metrics for MT (earlier EAGLES work had looked at other application areas). This has involved investigation of the various published evaluations of MT systems that have been carried out since 1979. However, this work is not being pursued in isolation, as MT is being used as a case study, to enable the later development of: a general theory about the methodology for evaluating HLT applications; and a general framework that can accommodate existing evaluation measures for specific HLT applications. As a step towards this goal, a specific framework for classifying MT evaluations has been elaborated, illustrating how the current state of the ISLE evaluation methodology can be applied.

This has involved the development of not one, but three parallel taxonomies that describe relevant aspects of the nature and use of MT: characteristics of machine translation purpose, characteristics of the machine translation process, and general software characteristics. In addition, individual evaluation measures have been identified and classified into appropriate groupings, and criteria have been developed for the application of each measure.

The results of this work are embodied in a web site that is intended to help 3 types of user: people who want to use a MT system; those who want to evaluate various MT systems; and those who want to design a new MT system or upgrade an existing one.

A workshop on MT Evaluation was held in conjunction with LREC 2000, and another on Hands-On Evaluation at AMTA 2000.
 

Natural Interaction and Multimodality

Work has concentrated in this period primarily on the need for standards regarding metadata aspects of language resources. i.e. their representation and annotation in a standard way. This work again takes inspiration from earlier EAGLES work. However, the earlier work has been extended to consider the needs of more than textual and spoken language resources, given the growing importance of natural interaction with information systems and the multimodal nature of such interaction, involving multimedia.

In order to further this work, an international workshop was held at LREC 2000, the major biennial event on language resources.

A workshop on Web-Based Language Documentation and Description was held in Philadelphia on December 12th-15th 2000. It was the initiative of two members of the US ISLE group and was jointly sponsored by IRCS (the Institute for Research in Cognitive Science of the University of Pennsylvania), ISLE and the NSF TalkBank project. It was attended by over 100 international participants from a wide range of industrial, academic and governmental areas. Links have been established with DublinCore, MPEG7 and W3C.
 

User Group, Promotion and Awareness

ISLE, given its standards-oriented profile, is committed to dissemination activities with a view to engendering and enhancing consensus regarding its recommendations and guidelines. Major events that project members took part in included:

LREC 2000 (numerous papers by EAGLES/ISLE participants)
Workshop on the Evaluation of Machine Translation
Workshop on Web-Based Language Documentation and Description (sponsored by ISLE)
First EAGLES/ISLE Workshop on Meta-Descriptions and Annotation Schemes for Multimodal/Multimedia Language Resources and Data Architectures and Software Support for Large Corpora
Workshop on Hands-On Evaluation of Machine Translation
Workshop on Web-Based Language Documentation and Description (sponsored by IRCS, ISLE and TalkBank)
 

Future Work

Work towards the development of MILE will continue, draft guidelines will be available towards the end of 2001, together with a prototype tool that will enable construction and validation of entries in the MILE format.

Regarding evaluation, a workshop on MT evaluation will be held in Geneva, 19th-24th April, 2001. Draft guidelines on evaluation will equally be available towards the end of 2001.

The EU-US CLWG meeting held at the University of Pennsylvania in December 2000 led to a commitment by US ISLE to fund participation of several Asian representatives at future meetings, as a means of preparing the way for future heavy Asian involvement in the expanding ISLE initiative. Draft results of the CLWG survey were sent to an Asian Federation meeting to engender further feedback and discussion on MILE. A main point of interest regarding opening to Asia is the Japanese project on Multimedia Annotation (MMA): exchanges of data and information with this project are planned to take place at various meetings, concerning both the CLWG and the NIMMWG.
 

Further Information

The EAGLES Secretariat welcomes feedback and enquiries regarding the work of ISLE.

EAGLES Secretariat
CNR - Consiglio Nazionale delle Ricerche
ILC - Istituto di Linguistica Computazionale
Area della Ricerca di Pisa San Cataldo
Via Moruzzi N° 1
56124 Pisa
ITALY
Phone: [+39] 050 315 2873
Fax 1: [+39] 050 315 2834
Fax 2: [+39] 050 315 2839

ISLE reports are placed on the ISLE web server as they receive approval for dissemination (i.e. as they are considered to represent a consensus view). Further details of the project together with links to earlier EAGLES work may also be found at this location.