PAROLE-SIMPLE-CLIPS is a four-level,
general purpose computational lexicon that has been elaborated over three
different projects. The kernel of the morphological and syntactic lexicons was
built in the framework of the LE-PAROLE project. The linguistic model and the
core of the semantic lexicon were elaborated in the LE-SIMPLE project, while
the phonological level of description and the extension of the lexical coverage
were performed in the context of the Italian project Corpora e
Lessici dell'Italiano Parlato e Scritto
(CLIPS).
The whole PAROLE-SIMPLE-CLIPS
lexicon consists of 55,000 lemmas with phonological, morphological and
syntactic description and 55,000 word senses encoded at the semantic level[1], in full accordance with the
international standards set out in the PAROLE-SIMPLE model. PAROLE-SIMPLE-CLIPS
offers therefore the outstanding advantage of being compatible with the other
eleven PAROLE-SIMPLE lexicons that were built for all European languages and
that share a common theoretical model, representation language and building
methodology.
A PAROLE-SIMPLE-CLIPS entry
gathers together all the phonological, morphological and inherent syntactic and
semantic properties of a headword. Its subcategorization pattern is (or are)
described in terms of optionality, syntactic function, syntagmatic realization
as well as morpho-syntactic, syntactic and lexical properties of each slot
filler. At the semantic level, the theoretical approach adopted by the SIMPLE
model is essentially grounded on a revisited version of some fundamental
aspects of the Generative Lexicon.
A SIMPLE-CLIPS semantic unit
is richly endowed with a wide range of fine-grained, structured information,
most relevant for NLP applications. First among them, the ontological typing:
the lexicon is in fact structured in terms of a multidimensional type system
based on both hierarchical and non-hierarchical conceptual relations, taking
into account the principle of orthogonal inheritance. Other relevant
information types in a word entry are its domain of use; type of denoted event;
synonymy and morphological derivation relations; membership in a class of
regular polysemy as well as any relevant distinctive semantic features.
Particularly outstanding is the information encoded in the Extended Qualia
Structure (a set of 60 semantic relations that allow modelling both the
different meaning dimensions of a word sense and its relationships to other
lexical units) and the Predicative Representation which describes the semantic scenario
the word sense considered is involved in and characterizes its participants in
terms of thematic roles and semantic constraints.
In a word’s description,
lexical information is interrelated across the four description levels. Syntactic
and semantic information, in particular, is related to each other through the
projection of the predicate-argument structure onto its syntactic
realization(s).
The PAROLE-SIMPLE-CLIPS
lexicon developed at the CNR Institute for Computational Linguistics (CNR-ILC), in
The PAROLE-SIMPLE-CLIPS
lexicon is distributed by the
Evaluations and Language resources Distribution Agency (ELDA).
______________________________
[ 1]
Syntactic and semantic encoding were performed jointly with Thamus (Consortium for Multilingual
Documentary Engineering), which is responsible for 25,000 entries.