Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies

Type of project: European  |  Start date: 01/01/2010  |  End date: 31/12/2012

A strategic challenge for Europe in today’s globalised economy is to overcome language barriers through technological means.

In particular, Machine Translation systems are expected to have a significant impact on the management of multilingualism in Europe, making it possible to translate the huge quantity of (written or oral) data produced, and thus, covering the needs of hundreds of millions of citizens.

The PANACEA project was addressing the most critical aspect for Machine Translation: the so-called language-resource bottleneck.

Language Technologies depend on the availability of language-dependent knowledge for their real-life implementation, i.e., they require Language Resources.

In addition, Language Resources for a given language can never be considered complete nor final because of the characteristics of natural language: language change and the emergence of new knowledge domains and new language varieties.

This constant need of Language Resources supply can only be satisfied with an automatic, dynamic and adaptive system for compiling, producing and validating Language Resources, a system conceived as an integrated machinery for the production of Language Resources.

The objective of PANACEA was to build a factory of Language Resources that progressively automates the stages involved in the acquisition, production, updating and maintenance of Language Resources required by Machine Translation systems, among other Language Technology applications and in the time required.

This automation can cut down the cost, time and human effort significantly.

PANACEA employed novel methods to automatically learn lexical and grammatical knowledge from large amounts of texts and present that knowledge in a form readily exploitable by automatic translation systems and other Language Technology applications.

The modules performing these tasks were integrated in a web-based virtual “factory” forming a standardized and language-independent scalable backbone for the functionalities to be plugged in.

A virtual “factory” that can produce desired Language Resources with the minimal expected quality and coverage and with no or limited human intervention.

The “factory” was validated by producing a pre-defined set of Language Resources and evaluating their quality.

PANACEA’s highly automated production of high-quality Language Resources can revolutionize many fields of Language Technology.

In particular, it can improve availability and quality of Machine Translation systems for all languages that have sufficient volumes and types of corpora available.

The results and future plans of PANACEA were presented in major Language Technology conferences (LREC, ACL, COLING, STATMT, MT Summit).

PANACEA organised two targeted workshops, one on scientific issues and another one on technology transfer.


Funding programme:
7th Framework Programme

Funding body:
European Commission

Grant agreement:


CNR-ILC role:

Alessandro Enea
Monica Monachini
Claudia Soria
Roberto Bartolini
Paola Baroni
Valeria Quochi
Riccardo Del Gratta
Irene Russo
Francesca Frontini
Irina Prodanof
Tommaso Caselli
Francesca Strik Lievers