Annotated corpora
- Corpus of Sentences rated with Human Complexity Judgments
- Evalita 2011 “Domain Adaptation for Dependency Parsing”
- Evalita 2011 “Frame Labelin
- Evalita 2020: AcCompl-it Acceptability & Complexity evaluation task for Italian
- IMPaCTS – Italian Multilevel Parallel Corpus for Text Simplification
- ISACCO – Italian School-Age Children COrpus
- ISST-TANL Corpus
- PaCCSS-IT – Parallel Corpus of Complex-Simple Sentences for ITalian
- SemEval-2022 “PreTENS-Evaluating Neural Networks on Presuppositional Semantic Knowledge
- SPLeT 2012 “First Shared Task on Dependency Parsing of Legal Texts”
- Terence and Teacher
Unannotated corpora
CLIC
Lexica
PAROLE-SIMPLE-CLIPS
It is a four-level general-purpose lexicon that has been developed in three different projects. The morphological and syntactic lexicon core was built within the European project “Preparatory Action for the Organisation of Language Resources for Language Engineering” (LE-PAROLE). The language model and the semantic lexicon core were developed within the European project “Semantic Information for Multifunctional Multilingual Lexicons” (LE-SIMPLE). The phonological level of description and the extent of lexical coverage were produced in the context of the Italian project “Corpora e Lessici dell’Italiano Parlato e Scritto” (CLIPS). It comprises a total of 387,267 phonetic units, 53,044 morphological units (53,044 lemmas), 37,406 syntactic units (28,111 lemmas) and 28,346 semantic units (19,216 lemmas). It has been semantically coded in full compliance with the international standards specified in the PAROLE-SIMPLE model and based on EAGLES. The syntactic and semantic encoding was carried out in collaboration with Thamus (Consortium for Multilingual Documentary Engineering), which is responsible for 25,000 additional entries.
SIMPLE LOD
It is the RDF serialisation of all nouns extracted from the PAROLE-SIMPLE-CLIPS lexicon. Lexical entries are serialised in Lemon, while semantic relations are modelled according to SIMPLE’s OWL.
ItalWordNet LOD
Italian Word Embeddings
Two sets of word embeddings trained starting from two different corpora: itWaC and Twitter.
Learn more: Italian Word Embeddings.
FrameNet
GeoDomainWordNet
datahub; ILC for English; ILC for Italian The concepts of the GeoNames ontology, with their English labels and glosses, in Italian have been transformed into a WordNet-like resource, and have been duly linked to the generic WordNets of both languages. This resource is published in RDF in accordance with the W3C and the Lemon schema.
AncientGreekWordNet LOD
Linked open data related to the ‘AncientGreekWordNet’ section of CoPhiWordNet.
Sentiment Lexicon LOD
The Italian Sentiment Lexicon (in LMF format) was developed semi-automatically by ItalWordNet from a manually checked list of 1,000 keywords. It contains 24,293 lexical entries annotated with positive/negative/neutral polarity.
Twitter for Sentiment Analysis
The corpus “Twitter for Sentiment Analysis” is a collection of tweets containing text and images collected from July to December 2016. Each tweet has been labeled according to the sentiment polarity of the text. The tweets having the most confident textual sentiment predictions have been selected to build a Twitter for Sentiment Analysis (T4SA) dataset.
Learn more: Twitter for Sentiment Analysis
Domain Terminologies
IMAG-Act
It is an interlingual action ontology. Using speech corpora, 1,010 high-frequency action concepts were identified and visually represented with prototypical scenes. The ontology allows the definition of interlingual correspondences between verbs and actions in English, Italian, Chinese and Spanish. Thanks to the visual representation of the identified action concepts, IMAG-Act can potentially be extended to any language.
FiscalDB
SindacDB
Mariterm
Biolessico
Ontologies
Other resources
The ILC4CLARIN repository hosts a constantly updated collection of language resources developed by the Cnr-Istituto di Linguistica Computazionale “Antonio Zampolli”. These resources are deposited and made available in accordance with the FAIR (Findable, Accessible, Interoperable, Reusable) principles.
