Knowledge Extraction Tools
It is a platform for the automatic extraction of linguistic and domain-specific information from document collections. It provides a structured organization of the extracted knowledge and indexes the analysed texts with respect to the extracted information. It relies on a battery of tools for Natural Language Processing (NLP), text statistical analysis and machine learning, which are dynamically integrated to provide an accurate representation of the linguistic information and of the domain-specific content of English and Italian text corpora in different domains.
READ-IT: Assessing Readability of Italian Texts
It is the first readability advanced assessment tool for what concerns Italian. It combines traditional raw text features with lexical, morpho-syntactic and syntactic information. In READ-IT readability assessment is carried out with respect to both documents and sentences. The second type of assessment represents the important novelty of the proposed approach, creating the prerequisites for aligning the readability assessment step with the text simplification process.
They are services developed within the European project named "PANACEA" and hosted at ILC-CNR. They allow for the automatic construction of language resources and provide format converters, pos-taggers, dependency parsers, lexicon acquisition tools (MultiWord and subcategorization extractors, lexicon mergers). Tutorials for the use of these services and the composition of work-flows are available here.