GDLIplus

A New Resource for the History of Italian: The Corpus of the Quotes in the «Grande dizionario della lingua italiana»

Type of project: Regional  |  Start date: 01/10/2025  |  End date: 30/09/2027

Published in 21 volumes between 1961 and 2002, the «Grande dizionario della lingua italiana» (GDLI) is the most important historical dictionary of the Italian language. Like all historical dictionaries, the GDLI bases the lexicographical description of words on the rich collection of quotes, which cover the entire history of the Italian language.

Thanks to the digitization work of the GDLI carried out by Cnr-Istituto di Linguistica Computazionale “Antonio Zampolli” (CNR-ILC) in collaboration with the Accademia della Crusca, we can estimate that the corpus of quotes (Corpus GDLIplus) includes over two and a half million entries, taken from more than 14,000 sources (and over 6,000 authors), with a total of about 50 million occurrences.

Italian has long remained a “written” language: the history of Italian is, in fact, at least until I Promessi Sposi, the history of literary Italian. It is therefore easy to understand how the Corpus GDLIplus can be considered a formidable resource for the history of the Italian language, useful not only to scholars but also to teachers and students, and even to everyday Internet users. The GDLIplus project aims to create this resource.

To achieve this goal, two main activities are required:

  1. The corpus must be annotated: each word must be associated with linguistic information (lemma and morpho-syntactic category). Despite recent advances, methods and techniques for automatic language processing are not immediately applicable to historical texts and require specializations at various levels.
  2. The lexicographical origin of the texts in the corpus poses specific management challenges. The most significant issue concerns cases where the same textual passage is cited multiple times under different entries. Implementing the Corpus GDLIplus requires developing a strategy for managing repeated examples, and, even before that, establishing a method for their automatic identification.

Acronym:
GDLIplus

Funding programme:
Programma regionale FSE+ 2021-2027

Funding body:
Regione Toscana | Accademia della Crusca

Status:
Ongoing

CNR-ILC role:
Coordinator

Project coordinator:
Elisa Guadagnini (CNR-ILC)

Staff:
Marco Biffi, Responsabile scientifico per l’Accademia della Crusca
Eva Sassolini (CNR-ILC)
Simonetta Montemagni (CNR-ILC)
Manuel Favaro (CNR-ILC)
Noemi Terreni (CNR-ILC)