Language Resources, Standards and Research Infrastructures
Research in the field of Language Engineering and the production cycle of language resources are optimized through the adoption of standards, the exchange of good practices for interoperability, the recycling and re-use of available results in terms of data and tools. The goal of this strategic line is the definition of models for the creation, representation, extension and maintenance of computational lexicons, terminological and ontological collections, corpora and language technologies. The technological solutions found in this area are aimed at the development of a distributed and cooperative research infrastructure designed to establish new functionalities of access, interoperability and sharing of language resources and tools.
Natural Language Processing and Knowledge Extraction
The techniques developed allow to automatically access the contents of a text and meet a wide range of informational needs of speakers: from the semantic-based access to textual content, up to the evaluation of the text structure as an indicator of its accessibility and communicative effectiveness. The technological solutions proposed respond to the requirements of research and "smart" management of the information contained within large document databases and can be used in numerous commercial applications in order to meet the needs of the society.
(Bio-)Computational Models of Language Usage
The factors analyzed here are the ones ruling the processes of comprehension, production, learning and variation of a language and the dynamic interactions among them. In particular, the theoretical models of language usage and their empirical verification are developed through: probabilistic methods for the study of corpora, lexicons and databases; computational simulations; a study of linguistic evidence with an experimental, clinical and acquisitional nature. Methods of formal representation and symbolic modeling are combined with methods, data and tools for the investigation of disciplinary sectors more oriented to the analysis of language usage in finalized and controlled contexts, such as psycho- and neuro-linguistics, sociolinguistics and language teaching.
The acquisitions and knowledge of computer sciences are combined with the methodological approaches and theoretical models of text analysis and philology, thus contributing to the transformation of the methods of conservation, fruition, study and publication of literary, archival and library documents. The technological solutions implemented offer new perspectives of investigation and sharing and are integrated in a "multi-modular" system with independent but interconnected components, from which the various methods of access, management, study and revision of the text can take advantage and opportunities of interaction/integration.
Some of ILC specific competences are:
- standardization of language resources;
- design and construction of computational lexicons for different languages;
- design and construction of textual and multimodal corpora;
- design and construction of ontologies;
- research infrastructure for the validation and distribution of language resources and technologies;
- preservation of minority languages;
- Digital Humanities;
- Machine Learning;
- Information Extraction and Retrieval;
- platforms for multi-level automatic linguistic annotation of texts;
- methods and techniques for the linguistic simplification;
- methods and techniques for the evaluation of linguistic skills;
- extraction of terminological and ontological knowledge from domain-specific documentary databases;
- systems for Sentiment Analysis and Opinion Mining;
- technologies for the Semantic Web;
- computational models of language usage;
- systems for Computer Assisted Translation;
- platforms for textual analysis.