The DylanLab organizes the Domain Adaptation for Dependency Parsing task
and Frame Labeling over the Italian Texts task
Evalita 2011 is about to start!
For more information, see the Evalita 2011 Call for Interest
EVALITA is an initiative devoted to the Evaluation of Natural Language Processing and Speech tools for Italian.
The DylanLab organizes the two following tasks:
Domain Adaptation for Dependency Parsing
The domain adaptation task aims to investigate techniques for adapting state-of-the-art dependency parsing systems to domains outside of the data from which they were trained or developed. This is the first time that such a task is proposed in the framework of the EVALITA campaign and for the Italian language.
The goal of this task is to learn how to increase the accuracy of a parsing system when dealing with out-of-domain texts. In particular, the task will consist in learning how to derive labelled dependency relations for Italian by means of a parser developed for general language.
The following data sets (in CoNLL format) will be distributed:
- for the source domain:
- a training set represented by the ISST-TANL corpus jointly developed by the Istituto di Linguistica Computazionale “Antonio Zampolli” (ILC-CNR) and the University of Pisa (UniPi) and already used in the dependency parsing track of EVALITA 2009 (pilot sub-task);
- a development set of about 5,000 tokens;
- for the target domain:
- a target corpus drawn from an Italian legislative corpus, gathering laws enacted by different releasing agencies (European Commission, Italian State and Regions) and regulating a variety of domains, ranging from environment, human rights, disability rights to freedom of expression. The target corpus includes automatically generated sentence splitting, tokenization and PoS tagging;
- a manually annotated development set of about 5,000 tokens, also including labeled dependency relations.
Evaluation will be carried out in terms of standard accuracy dependency parsing measures (labeled attachment score, unlabelled attachment score, label accuracy) with respect to a test set of texts from the target domain of about 5,000 tokens including manually revised PoS-tags.
Developed systems can only exploit resources (data) provided by the organizers. This also entails that the use of additional components that have been trained on another set of data is prohibited.
Felice Dell’Orletta (ILC-CNR, Pisa), Simonetta Montemagni (ILC-CNR, Pisa), Giulia Venturi (ILC-CNR, Pisa)
- for the source domain:
Frame Labeling over the Italian Texts
In the “Frame Labeling over Italian Texts” (FLaIT) evaluation exercise systems have to detect the semantic
frame “evoked” by a predicate and the major semantic roles explicitly mentioned in an Italian sentence, according to the frame semantics paradigm of (Fillmore, 1985). In particular, the task consists in recognizing words and phrases that evoke semantic frames of the sort defined in the FrameNet project (Baker et al., 1998, http://framenet.icsi.berkeley.edu), and their semantic dependents, which are usually, but not always, their syntactic dependents.
We will refer to this problem as Semantic Role Labeling (SRL). As in previous SRL shared tasks (e.g. CoNLL-2004 and CoNLL-2005), the general goal is to come forward with representation models, inductive algorithms and inference methods which address the proposed SRL problem.
Previous experiences (as in CoNLL-2004/2005 or Semeval 2007 (Baker et al., 2007)) were focused on developing SRL systems based on partial parsing information and/or increasing the amount of syntactic and semantic input information, aiming to boost the performance of machine learning systems on the SRL task. Accordingly, the Evalita 2011 FLaIT challenge will concentrate on the definition of different tasks, focusing on different aspects of the SRL problem:
- We encourage the adoption of basic resources for Italian that are under development in the iFrame project (http://sag.art.uniroma2.it/iframe/doku.php). These resources will be made publicly available to all groups participating to the FLaIT task
- The exploitation of syntactic information is also encouraged as the challenge is expected to shed some light on the impact of current parsing resources and technologies for Italian on the overall SRL task
- Interested groups that may not rely on proprietary parsing technologies will be supported in their participation as they will be provided with annotations for the development and test data at the morphological and syntactic level (at least lemmas, POS tags and Named Entities are expected). The level of quality of these auxiliary information may not be homogeneous, as no full manual validation in the released material is expected for the 2011 EvalIta edition
- The use of external lexico-semantic knowledge bases for the Italian language is also encouraged, as the impact of the different resources is one of the targeted research aspects of the challenge. All participants are suggested to propose novel learning architectures for better exploiting the data structures, relations and constraints of the problem
Participant systems will be evaluated in different categories, depending on whether they use the auxiliary information strictly contained in the training data (closed challenge) or they make use of external sources of information and/or tools (open challenge). Participants in the open challenge are encouraged to propose novel ideas for using rich semantic information, e.g., Wordnet for Italian, other lexico-semantic resources, such as for example distributional lexical semantic information, or word sense disambiguation tools, etc. The use of unlabeled examples might be also considered.
Organizers: Roberto Basili (University of Roma, Tor Vergata), Alessandro Lenci (University of Pisa)
Steering Committe: Alessandro Moschitti (University of Trento), Sara Tonelli (University of Venice), Diego Decao (University of Roma, Tor Vergata), Giampaolo Mazzini (CELI, Torino)
Pagina precedente: Primo piano