Study of the semantic shift in the historical evolution of the Italian language conducted through neuronal language models

In-house seminar

The seminar will illustrate the study, conducted for an experimental thesis project, aimed at creating a computational method capable of dealing with the change of meaning in diachrony, based on BERT (Bidirectional Encoder Representations from Transformers) language models, with particular emphasis on the treatment of historical varieties of the Italian language. The objective of the study concerns the hypothesis that clustering, applied to contextual embeddings produced by BERT models, is the most suitable tool for a study of this type. In fact, the recent reference literature indicates it as a possible proxy, but has not definitively proved it. The experimental approach adopted is aimed at understanding which are the most recent computer survey techniques, which are the reference models in literature for this task, which are the most effective tools, how and with which results they are implemented . The presentation traces the work phases that led to the construction of the method and all the experiments conducted, trying to answer questions such as: How is the clustering operation configured with respect to the representation of the senses of a word? That is, does the distance between clusters necessarily indicate that the words that belong to one are semantically ‘distant’ from those that belong to another one? Shifting the focus from the investigation tool to the object of the same one, we then wanted to analyze what these tools really aggregate and, in particular, to try to understand how much the context contributes to the interpretation of the meaning of the words contained in a sentence.

Speaker(s): Eva Sassolini

She has been an employee of the Institute for Computational Linguistics “A. Zampolli” of the National Research Council of Italy since 2008. She has multi-year IT skills in the development and adaptation of textual analysis, in text indexing and processing and in the realization of tools for the acquisition and management of textual corpora. She has experience in text morphosyntactic annotation and structured texts processing and, for both types, she has developed query and analysis tools. She has matured a consolidated experience in methods and procedures for the long term preservation of digital archives of historical and cultural value as well as in the management of annotated texts and their conversion into international standards for representation. Over the years she held roles of scientific responsibility in the framework of the institute historical collaborations, such as the ones with the Accademia della Crusca in Florence, the Opera of Santa Maria del Fiore in Florence and the Department of Legal Sciences of the University of Rome “La Sapienza”. She held the role of technical-IT manager within the institute collaborations with the Universidad Nacional de Educacion a Distancia (UNED) in Madrid, the Museo Galileo of Florence, the RAI Technological Strategies Department of Rome (STRAT) and the RAI Research Centre in Turin (CRIT). With the same type of responsibility she collaborated on national and international projects, starting from projecs no. 6, 7 and 8 funded within the research investment programme related to the Law no. 488 of 1999.