ITALWORDNET

Rete Semantico-Lessicale per la Lingua Italiana

Type of project: National  |  Start date: 14/11/2014  |  End date: 14/11/2014

ItalWordNet (IWN) is a lexical-semantic database developed in the framework of two different research projects: EuroWordNet (EWN) and Sistema Integrato per il Trattamento Automatico del Linguaggio (SI-TAL), a national project devoted to the creation of large linguistic resources and software tools for the Italian written and spoken language processing.

Among the resources developed in SI-TAL, IWN has been built as the reference semantic database, by extending the Italian wordnet developed within the EWN project.

In the framework of EWN, a linguistic model providing a rich set of semantic relations was designed [Alonge et al. 1998] and the first nucleus of data (verbs and nouns) was encoded [Roventini et al. 1998].

The wordnet is structured in the same way as the Princeton WordNet, namely around the notion of synset or set of synonymous word meanings (according to a very wide concept of synonymy: meanings must be interchangeable in a context at least).

In addition to the internal language relations, equivalence relations were also encoded between Italian synsets and the closest concepts in an Inter-Lingual Index (ILI), a separate language-independent module containing all WN1.5 synsets but not the relations among them.

During the SI-TAL project, this wordnet was improved and extended through both the addition of nouns and verbs not yet encoded in EWN and the encoding of adjectives, adverbs and proper names, identifying also some additional relations, mainly in order to encode data about adjectives (please see: [Alonge et al. 2000], [Roventini et al 2000], [Marinelli e Roventini 2002] and [Roventini et al. 2003].

In its generic version, the IWN database is now formed of:

  • a wordnet containing about 47.000 lemmas, 50.000 synsets and 130.000 semantic relations (among the relations encoded the most important are the following ones: hyperonymy/hyponymy, antonimy, meronimy, relation of cause, relation of role etc.);
  • an Inter-Lingual Index (ILI), which is an unstructured version of WN1.5:
  • this module, used in EWN to link wordnets of different languages, was also maintained in IWN to make the resource usable in multilingual applications;
  • the Top Ontology (TO), a hierarchy of language-independent concepts, reflecting fundamental semantic distinctions, built within EWN and partially modified in IWN to account for adjectives (not dealt with in EWN):
  • the TO is formed of language-independent features, which may (or may not) be lexicalised in various ways, or according to different patterns, in different languages [Rodriguez et al. 1998]; through the ILI, all the concepts in the wordnet are directly or indirectly linked to the TO.

Since 2003 a terminological wordnet relating to the domain of navigation and sea-transportation and connected to the IWN generic wordnet is being developed [Marinelli et al. 2004].

The IWN database is continuously updated and improved at ILC. In particular, studies about proper names and their extensions of (metaphorical and metonymical) use observable on the referring corpus of Italian have been carried out [Marinelli et al. 2005].

For further information, do not hesitate to contact Monica Monachini or Roberto Bartolini.

 

Acronym:
ITALWORDNET

Status:
Ended