SAVAS - CNR-ILC

Type of project: European | Start date: 01/05/2012 | End date: 30/04/2014

The SAVAS project collected spoken and textual resources in six European languages and built domain-specific Large Vocabulary Continuous Speech Recognizers (LVCSR) to solve the automated subtitling needs of the Media Industry.

More specifically, the main objectives of the project were:

to make more effective the acquisition and annotation of audiovisual language resources produced by broadcasters and subtitling companies for the development of LVCSR systems targeting automated subtitling;
to deploy a platform to share audiovisual language resources between the media industry and the LVCSR developers through the most suitable legal and business data trading approaches within the Media Industry;
to show the impact of feeding LVCSR technology with existing audiovisual language resources for automated subtitling purposes.

In order to achieve these goals, SAVAS:

collected spoken and textual resources in the languages addressed from the broadcasters and subtitling companies acting as data providers within the consortium;
transcribed and annotated the collected corpora into a form suitable to train acoustic and language models of LVCSR systems using a combination of automatic and collaborative approaches;
built a local META-SHARE repository containing the collected and annotated SAVAS language resources to allow their reuse;
adapted and trained dictation and transcription LVCSR systems with the SAVAS language resources;
integrated and evaluated the developed systems into several automated subtitling application scenarios in order to show the impact of audiovisual data sharing for automated subtitling.

Acronym:
SAVAS

Funding programme:
7th Framework Programme

Funding body:
European Commission

Grant agreement:
FP7-ICT-2011-SME-DCL-296371

Status:
Ended

CNR-ILC Research Unit Chair:
Monica Monachini

Staff:
Paola Baroni
Francesca Frontini

Website/s:
http://www.fp7-savas.eu