Computational Linguistics (CL) is an intrinsically interdisciplinary research field based on the synergy between different skills and expertise, mainly shared between Linguistics, Computational Linguistics and Computer Science. This is confirmed by the classification proposed by the European Research Council (ERC), within which CL has a dual classification, both in the area of the Social Sciences and Humanities (sector “SH4 The Human Mind and its Complexity”) and in the area of Computer Science (sector “PE6 Computer Science and Informatics”).

In the field of Social Sciences and Humanities, CL is today called upon to play a role of “interface” between language and text sciences, e.g. between Theoretical Linguistics and Typological Linguistics, between Language History and Philology, between Psycholinguistics and Neurolinguistics.

At the same time, the CL is able to interface the Language Sciences as a whole and Artificial Intelligence (AI), helping to redefine the goals and methods of both areas.


The research activities of the Cnr-Istituto di Linguistica Computazionale “Antonio Zampolli” focus on the following macro-areas:

Digital Humanities

Development of models, methods and techniques for the preservation, intelligent use, linguistic study (diachronic, synchronic, comparative) and philological study (ecdotic and interpretative) of texts of interest to the Social Sciences and Humanities, with a focus on historical and literary texts.

The acquisitions and knowledge of the Computer Sciences are combined with the methodological approaches and theoretical models of Text Analysis and Philology, thus contributing to the transformation of the ways in which literary, archival and library documents are preserved, used, studied and published.

Natural Language Processing and Knowledge Management

Development of methods, models and techniques based on symbolic and probabilistic algorithms and neural networks for Natural Language Processing (NLP) tasks in its different varieties of use and with a focus on the Italian language, and for the extraction and representation of knowledge encoded within texts.

The technological solutions proposed meet the needs of “intelligent” information research and management within large document bases in continuous evolution and can be used in numerous applications to meet the needs of society.

Language Resources, Standards and Research Infrastructures

Development and management of language resources (computational lexicons, terminological and ontological repositories, corpora), with a focus on the representation of data according to international standards that guarantee their sharing, interoperability and long-term preservation in line with Open Science principles.

The technological solutions developed in this area are aimed at the development of a distributed and cooperative research infrastructure to establish new access, interoperability and sharing functionalities for language resources and tools.

(Bio-)Computational models of language usage

Analysis of the factors governing the processes of comprehension, production, learning and variation of a language, and the dynamic interactions between them. In particular, theoretical models of language use and their empirical verification are developed through: probabilistic methods for the study of corpora, lexicons and databases; computational simulations; study of linguistic evidence of an experimental, clinical and acquisitional nature.

The methodologies of formal representation and symbolic modelling are combined with the methods, data and investigative tools of disciplinary fields more oriented to the analysis of language use in purposeful and controlled contexts, such as Psycho- and Neuro-Linguistics, Sociolinguistics and Glottodidactics.

Main research topics

  • text analysis
  • automatic multilevel linguistic annotation of text
  • textual and multimodal, mono and multi-language corpora
  • Digital Humanities
  • knowledge extraction from domain document bases
  • digital philology
  • Information extraction and retrieval
  • research infrastructures
  • computational, mono- and multi-language lexicons
  • digital lexicography
  • minority languages
  • machine learning and deep learning
  • computational models of language use
  • terminological repertoires and ontologies
  • semantic web
  • linguistic simplification
  • sentiment analysis and opinion mining
  • definition of representation standards for language resources
  • text mining
  • Computer-Assisted Translation
  • Natural Language Processing (NLP)
  • assessment of language competence