{"id":10771,"date":"2022-07-25T08:39:55","date_gmt":"2022-07-25T06:39:55","guid":{"rendered":"https:\/\/www.ilc.cnr.it\/resources\/"},"modified":"2026-05-28T15:11:44","modified_gmt":"2026-05-28T13:11:44","slug":"resources","status":"publish","type":"page","link":"https:\/\/www.ilc.cnr.it\/en\/resources\/","title":{"rendered":"RESOURCES"},"content":{"rendered":"\n<h1 class=\"wp-block-heading\"><strong>Annotated corpora<\/strong><\/h1>\n\n\n\n<ul>\n<li><a href=\"https:\/\/www.ilc.cnr.it\/en\/corpus-of-sentences-rated-with-human-complexity-judgments\/\">Corpus of Sentences rated with Human Complexity Judgments<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.ilc.cnr.it\/en\/cita-corpus-italiano-di-apprendenti-l1\/\">CItA &#8211; Corpus Italiano di Apprendenti L1<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.ilc.cnr.it\/en\/evalita-2011-domain-adaptation-for-dependency-parsing\/\">Evalita 2011 \u201cDomain Adaptation for Dependency Parsing\u201d<\/a><\/li>\n\n\n\n<li><a href=\"http:\/\/sag.art.uniroma2.it\/flait\/\" target=\"_blank\" rel=\"noreferrer noopener\">Evalita 2011 \u201cFrame Labelin<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.ilc.cnr.it\/en\/evalita-2020-accompl-it-acceptability-complexity-evaluation-task-for-italian\/\">Evalita 2020: AcCompl-it Acceptability &amp; Complexity evaluation task for Italian<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.ilc.cnr.it\/en\/impacts-italian-multilevel-parallel-corpus-for-text-simplification\/\">IMPaCTS &#8211; Italian Multilevel Parallel Corpus for Text Simplification<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.ilc.cnr.it\/en\/isacco-italian-school-age-children-corpus\/\">ISACCO &#8211; Italian School-Age Children COrpus<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.ilc.cnr.it\/en\/isst-tanl-corpus\/\">ISST-TANL Corpus<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.ilc.cnr.it\/en\/paccss-it-parallel-corpus-of-complex-simple-sentences-for-italian\/\">PaCCSS-IT &#8211; Parallel Corpus of Complex-Simple Sentences for ITalian<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.ilc.cnr.it\/en\/semeval-2022-pretens-evaluating-neural-networks-on-presuppositional-semantic-knowledge\/\">SemEval-2022 \u201cPreTENS-Evaluating Neural Networks on Presuppositional Semantic Knowledge<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.ilc.cnr.it\/en\/similex-sentence-similarity\/\">SimilEx &#8211; Sentence Similarity<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.ilc.cnr.it\/en\/splet-2012-first-shared-task-on-dependency-parsing-of-legal-texts\/\">SPLeT 2012 \u201cFirst Shared Task on Dependency Parsing of Legal Texts\u201d<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.ilc.cnr.it\/en\/terence-and-teacher\/\">Terence and Teacher<\/a><\/li>\n<\/ul>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h1 class=\"wp-block-heading\"><strong>Unannotated corpora<\/strong><\/h1>\n\n\n\n<h3 class=\"wp-block-heading\">CLIC<\/h3>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h1 class=\"wp-block-heading\">Lexica<\/h1>\n\n\n\n<h3 class=\"wp-block-heading\">PAROLE-SIMPLE-CLIPS<\/h3>\n\n\n\n<p>It is a four-level general-purpose lexicon that has been developed in three different projects. The morphological and syntactic lexicon core was built within the European project &#8220;Preparatory Action for the Organisation of Language Resources for Language Engineering&#8221; (LE-PAROLE). The language model and the semantic lexicon core were developed within the European project &#8220;Semantic Information for Multifunctional Multilingual Lexicons&#8221; (LE-SIMPLE). The phonological level of description and the extent of lexical coverage were produced in the context of the Italian project &#8220;Corpora e Lessici dell&#8217;Italiano Parlato e Scritto&#8221; (CLIPS). It comprises a total of 387,267 phonetic units, 53,044 morphological units (53,044 lemmas), 37,406 syntactic units (28,111 lemmas) and 28,346 semantic units (19,216 lemmas). It has been semantically coded in full compliance with the international standards specified in the PAROLE-SIMPLE model and based on EAGLES. The syntactic and semantic encoding was carried out in collaboration with Thamus (Consortium for Multilingual Documentary Engineering), which is responsible for 25,000 additional entries.<\/p>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">SIMPLE LOD<\/h3>\n\n\n\n<p>It is the RDF serialisation of all nouns extracted from the PAROLE-SIMPLE-CLIPS lexicon. Lexical entries are serialised in Lemon, while semantic relations are modelled according to SIMPLE&#8217;s OWL.<\/p>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">ItalWordNet LOD<\/h3>\n\n\n\n<p><strong><a href=\"http:\/\/datahub.io\/dataset\/iwn\" target=\"_blank\" rel=\"noreferrer noopener\">datahub<\/a><\/strong>; <a href=\"http:\/\/www.languagelibrary.eu\/owl\/italWordNet15\/schema\/synset\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>ilc<\/strong><\/a><\/p>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Italian Word Embeddings<\/h3>\n\n\n\n<p>Two sets of word embeddings trained starting from two different corpora: itWaC and Twitter.<br>Learn more: <a href=\"https:\/\/www.ilc.cnr.it\/en\/italian-word-embeddings\/\">Italian Word Embeddings<\/a>.<\/p>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">FrameNet<\/h3>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">GeoDomainWordNet<\/h3>\n\n\n\n<p><strong><a href=\"\/\/datahub.io\/dataset\/geodomainwn\" target=\"_blank\" rel=\"noreferrer noopener\">datahub<\/a><\/strong>; <strong><a href=\"http:\/\/www.languagelibrary.eu\/owl\/geodomainWN\/eng\/geonames-synset\" target=\"_blank\" rel=\"noreferrer noopener\">ILC for English<\/a><\/strong>; <strong><a href=\"http:\/\/www.languagelibrary.eu\/owl\/geodomainWN\/ita\/geonames-synset\" target=\"_blank\" rel=\"noreferrer noopener\">ILC for Italian<\/a> <\/strong>\nThe concepts of the GeoNames ontology, with their English labels and glosses, in Italian have been transformed into a WordNet-like resource, and have been duly linked to the generic WordNets of both languages. This resource is published in RDF in accordance with the W3C and the Lemon schema.<\/p>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>AncientGreekWordNet LOD<\/strong><\/h3>\n\n\n\n<p>Linked open data related to the &#8216;AncientGreekWordNet&#8217; section of CoPhiWordNet.<\/p>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Sentiment Lexicon LOD<\/h3>\n\n\n\n<p>The <a href=\"https:\/\/github.com\/opener-project\/public-sentiment-lexicons\/tree\/master\/propagation_lexicons\/it\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Italian Sentiment Lexicon<\/strong><\/a> (in LMF format) was developed semi-automatically by ItalWordNet from a manually checked list of 1,000 keywords. It contains 24,293 lexical entries annotated with positive\/negative\/neutral polarity.<\/p>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Twitter for Sentiment Analysis<\/h3>\n\n\n\n<p>The corpus &#8220;Twitter for Sentiment Analysis&#8221; is a collection of tweets containing text and images collected from July to December 2016. Each tweet has been labeled according to the sentiment polarity of the text. The tweets having the most confident textual sentiment predictions have been selected to build a Twitter for Sentiment Analysis (T4SA) dataset.<br>Learn more:  <a href=\"https:\/\/www.ilc.cnr.it\/en\/twitter-for-sentiment-analysis\/\">Twitter for Sentiment Analysis<\/a><\/p>\n\n\n\n<div style=\"height:25px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h1 class=\"wp-block-heading\"><strong>Domain Terminologies<\/strong><\/h1>\n\n\n\n<h3 class=\"wp-block-heading\"><a href=\"http:\/\/www.imagact.it\/imagact\/query\/dictionary.seam\" target=\"_blank\" rel=\"noreferrer noopener\">IMAG-Act<\/a><\/h3>\n\n\n\n<p>It is an interlingual action ontology. Using speech corpora, 1,010 high-frequency action concepts were identified and visually represented with prototypical scenes. The ontology allows the definition of interlingual correspondences between verbs and actions in English, Italian, Chinese and Spanish. Thanks to the visual representation of the identified action concepts, IMAG-Act can potentially be extended to any language.<\/p>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">FiscalDB<\/h3>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">SindacDB<\/h3>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Mariterm<\/h3>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Biolessico<\/h3>\n\n\n\n<div style=\"height:13px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<h3 class=\"wp-block-heading\">Ontologies<\/h3>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"wp-block-heading\">Other resources<\/h1>\n\n\n\n<p>The <strong><a href=\"https:\/\/ilc4clarin.ilc.cnr.it\/en\/\" target=\"_blank\" rel=\"noreferrer noopener\">ILC4CLARIN<\/a><\/strong> repository hosts a constantly updated collection of language resources developed by the <strong>Cnr-Istituto di Linguistica Computazionale &#8220;Antonio Zampolli&#8221;<\/strong>. These resources are deposited and made available in accordance with the FAIR (Findable, Accessible, Interoperable, Reusable) principles.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-layout-flex wp-container-core-buttons-layout-1 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/dspace-clarin-it.ilc.cnr.it\/home\" target=\"_blank\" rel=\"noreferrer noopener\">BROWSE THE COLLECTION<\/a><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Annotated corpora Unannotated corpora CLIC Lexica PAROLE-SIMPLE-CLIPS It is a four-level general-purpose lexicon that has been developed in three different&hellip;<\/p>\n","protected":false},"author":3,"featured_media":10092,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"acf":[],"jetpack_sharing_enabled":true,"publishpress_future_action":{"enabled":false,"date":"2026-08-06 10:13:23","action":"change-status","newStatus":"draft","terms":[],"taxonomy":"translation_priority"},"publishpress_future_workflow_manual_trigger":{"enabledWorkflows":[]},"_links":{"self":[{"href":"https:\/\/www.ilc.cnr.it\/en\/wp-json\/wp\/v2\/pages\/10771"}],"collection":[{"href":"https:\/\/www.ilc.cnr.it\/en\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.ilc.cnr.it\/en\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.ilc.cnr.it\/en\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ilc.cnr.it\/en\/wp-json\/wp\/v2\/comments?post=10771"}],"version-history":[{"count":38,"href":"https:\/\/www.ilc.cnr.it\/en\/wp-json\/wp\/v2\/pages\/10771\/revisions"}],"predecessor-version":[{"id":24880,"href":"https:\/\/www.ilc.cnr.it\/en\/wp-json\/wp\/v2\/pages\/10771\/revisions\/24880"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.ilc.cnr.it\/en\/wp-json\/wp\/v2\/media\/10092"}],"wp:attachment":[{"href":"https:\/\/www.ilc.cnr.it\/en\/wp-json\/wp\/v2\/media?parent=10771"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}