http://www.ilc.pi.cnr.it/EAGLES96/isle/ISLE_Home_Page.htm
ISLE is the latest in a
series of projects under the successful EAGLES initiative (Expert Advisory
Group for Language Engineering Standards). It extends hitherto EU-based EAGLES
work within the EU-US international research cooperation framework, set up as a
result of two years of joint preparatory work towards an international HLT
standards oriented initiative.
The overall goals of ISLE
are to support HLT and national projects, and HLT industry in general, by
developing, disseminating and promoting widely agreed and urgently needed HLT
de facto standards and guidelines, for infrastructural language resources, for
the tools that exploit them, and for HLT products. The areas currently targeted
by ISLE are: multilingual computational lexicons, natural interaction and
multimodality, and evaluation of HLT systems.
A feature of EAGLES/ISLE
work is the close interaction between industry and academia, users and
providers, founders and beneficiaries.
ISLE has made substantial
progress in its three spheres of interest. In multilingual lexicons, the
initial survey phase has been followed by a phase of working intensively
towards specification of the Multilingual Isle Lexical Entry (MILE). This
has involved focussing on a complex word-pair, to gain insights into
possibilities for word sense representation and cross-language linkages, on
extracting and classifying sense indicators, and on developing a prototype tool
to manage MILE-based lexicons. In evaluation of HLT systems, user feedback
obtained through 3 international workshops has led to a second, refined version
of the ISLE evaluation framework. In natural interaction and multimodality
(NIMM), major surveys have been completed of resources, annotation schemes and
tools, and metadata descriptions and tools. A prototype tool has been developed
to annotate NIMM data. XML schemas have been developed to handle ISLE metadata
descriptions, and tools to allow editing and browsing of these descriptions,
including across distributed resources. As a whole, the project has been active
in dissemination and awareness activities on an international scale.
Work
towards developing MILE has been oriented towards the needs of several key HLT
applications: MT, cross-language information retrieval, cross-language
information extraction, multilingual language generation, multilingual
authoring and speech-to-speech translation. It is based on previous EAGLES work
on monolingual lexicon standards.
Following on from a major survey of important bi- and multilingual lexical
resources, ISLE has been working towards recommendations for MILE bilingual dictionary entries , by firstly focussing on
development of a prototype entry for a complex Italian-English word pair. This
has involved intensive exploitation of available resources such as SIMPLE (based on previous EAGLES recommendations), COMLEX and the British National Corpus . Much work has also been done on extracting and classifying
sense indicators used in bilingual dictionaries: analysis of these indicators
will lead to more precise specification of bilingual transfer conditions, an
hypothesis that has already been tested in work on the abovementioned complex
word pair. At the same time, the work to define the MILE lexical data model has
started, and existing frameworks and representation schemes have been surveyed.
The MILE will be based on an entity-relation model, whose main architecture has
already been designed. The model will also include a first repository of common
“lexical templates” for the fast development of multilingual lexical entries.
Important experience and feedback was gained through involvement in data
preparation for evaluation of 90 word sense disambiguation systems, covering 12
languages, conducted in SENSEVAL2 .
A prototype tool to manage computational lexicons
modelled according to ISLE recommendations has been developed. A public version
will be available next year. Another tool is being developed to browse semantic
indicators extracted from machine readable bilingual dictionaries, to support
the MILE-based lexicographer.
A workshop was held in
Pisa, Italy, bringing together ISLE members and also representatives of
industry and academia from several countries outside Europe and the USA,
notably from Asia.
The focus
of work on evaluation is on methods and metrics for Machine Translation (MT)
(earlier EAGLES work had looked at other application areas). This has involved
investigation of the various published evaluations of MT systems that have been
carried out since 1979. However, this work is not being pursued in isolation, as
MT is being used as a case study, to enable the later development of: a general
theory about the methodology for evaluating HLT applications; and a general
framework that can accommodate existing evaluation measures for specific HLT
applications. A second version of a specific framework for classifying MT
evaluations has been elaborated, illustrating how the current state of the ISLE
evaluation methodology can be applied.
This has involved the
development of not one, but three parallel taxonomies that describe relevant
aspects of the nature and use of MT: user purpose, application process and
general software characteristics. This year, the framework has been further
refined and further populated with individual evaluation measures, and
associated criteria for the application of each measure. Where appropriate,
concepts, practices and techniques are linked to published evaluations. The
work builds on ISO 9126 and ISO 14598.
The results of this work
are embodied in a web site that is intended to help 3 types of user:
people who need guidance in choosing a MT system; those who want to compare
various MT systems; and those who want to design a new MT system or upgrade an
existing one. Each part of the framework has its own comments page, which
allows the wider community to participate strongly in helping to develop the
framework.
An open workshop (documents available) on practical MT
evaluation using the initial framework took place in April 2001, in Geneva,
which aided development of the second version of the framework. Another 1-day workshop was held in Pittsburgh, USA, involving again
practical hands-on experience of MT evaluation. Industry was strongly
represented in both workshops. A third major workshop at the MT Summit VIII served to
synthesise results of the first two and to arrive at preliminary conclusions
that will then feed in to the end-of-project guidelines.
NIMM is concerned with extending previous
EAGLES work to cater for the needs of more than textual and spoken language
resources, given the growing importance of natural interaction with information
systems and the multimodal nature of such interaction, involving multimedia.
Work in this period has been on finalising surveys of NIMM data resources,
annotation schemes and tools (contribute your own tool description for the survey), and
metadata aspects of multimodal language resources,
with a view to elaborating guidelines on the representation, annotation and
description of such resources in a standard way.
NIMM sites are available in Europe and the USA .
User requirements for a tool for NIMM data annotation have been drawn up and
prototype tool elements have been developed on the basis of these.
Further information specifically on the Isle Metadata Initiative (IMDI) is available, including XML schemas for session and catalogue
metadata descriptions, for an IMDI-OLAC mapping and for description of lexicon
metadata; a user-friendly editor for metadata descriptions (including
converters from legacy data); and a tool to allow browsing in complex,
distributed metadata descriptions and launch of other tools on retrieved resources.
A demonstration of this latter tool involving
resources from 6 European institutes was prepared for the opening of the
European Year of Languages is available.
A workshop on linguistic databases, in Philadelphia, USA,
included consideration of work on ISLE standards.
ISLE, given
its standards-oriented profile, is committed to dissemination activities with a
view to engendering and enhancing consensus regarding its recommendations and
guidelines. This year has seen ISLE also establishing and developing contacts
with countries in Asia and the Americas. Steps have already been taken to
formalise cooperation with Asian countries. Events that project members took
part in included:
1st International Workshop on
Multimedia Annotation
, Tokyo, Japan
Vilem Mathesius Lecture Series 16 , Prague
Workshop on XML Markup Technologies for Working with
Linguistic Data ,
Edinburgh, UK
Workshop on Multimodal Communication and Context
in Embodied Agents
, Montreal, Canada
International E-MELD Meeting, Santa Barbara, California, USA
Publications by the
project, apart from those associated with project-organised workshops mentioned
above, include:
In the last
phase of the project, work will be concentrated on producing draft guidelines
for best practice in the areas being covered by ISLE and also on refining and
documenting the tools and exemplary resources that are intended to help users
in applying the guidelines.
Further contact and dissemination activities are planned to ensure the widest
possible feedback on the guidelines and to increase the level of cooperation
with other countries.
The EAGLES Secretariat welcomes feedback and enquiries regarding the work of ISLE.
EAGLES Secretariat
CNR - Consiglio Nazionale delle Ricerche
ILC - Istituto di Linguistica Computazionale
Area della Ricerca di Pisa San Cataldo
Via Moruzzi N° 1
56124 Pisa
ITALY
Phone: [+39] 050 315 2873
Fax 1: [+39] 050 315 2834
Fax 2: [+39] 050 315 2839
ISLE reports are placed on
the ISLE web server as they receive approval for
dissemination (i.e. as they are considered to represent a consensus view).
Further details of the project together with links to earlier EAGLES work may
also be found at this location.