Both Linguistic Complexity (LC) and Processing Difficulty (PD) represent multi-factorial and dynamic notions which relate to a multi-level linguistic competence, including lexical, morphological, and syntactic competences. Most research in the area has focused on specific sub-domains, e.g. morphology or syntax.
To the best of our knowledge, the interactions among complexity features belonging to different linguistic levels and their global effects on PD have never been investigated systematically for a given language. Even within individual sub-domains, several theoretical and operational definitions coexist, which makes interpretation and accumulation of findings problematic.
Recently, the increasing availability of reliable language technologies enabling complexity analysis have opened new research horizons. Natural Language Processing (NLP) methods and techniques make it possible to ground linguistic complexity research on corpora reflecting real language usage, to track and quantify a wide range of multi-level linguistic features, and to computationally model LC and PD.
Main objectives
- to arrive at theoretical and operational definitions of a set of linguistic features contributing to LC in Italian. Such definitions will be based on both previous research and new insights gained in the project from collected comprehension and production data and from computational modelling.
- to profile LC in texts written by monolingual and multilingual students of grade 13, including students with atypical development (typically, deaf students and students with Learning Difficulties, i.e., those who most probably will not continue their studies at university). To do so, written productions will be collected based on visual stimuli eliciting the production of target linguistic features and ensuring the comparability of results across populations.
- to look at the effects of LC on receptive processes. A set of linguistic prompts and tasks will be developed and administered to the same populations. The prompts will be constructed so as to reflect different types and degrees of LC, in order to assess their impact on comprehension and acceptability (operationalized as test responses) and on online cognitive processing (operationalized as reaction times and eye movements).
- to examine the role of LC and PD in a large-scale assessment scenario, namely, the yearly National Assessment carried out by the National Institute for the evaluation of the education and training system (INVALSI) in Italy. Thus, results from linguistic, psycholinguistic and computational analyses will be incorporated in INVALSI’s psychometric models, in order to reach a better understanding of the factors impacting on reading comprehension by students attending the last grade of secondary school.
- to investigate how the project results could be applied to teaching contexts, in order to propose more effective practices to foster receptive and productive skills in different populations, such as students with typical and atypical development and monolingual or multilingual speakers. Both production and reception data will provide evidence for the relationships between LC and PD: features causing lower comprehension accuracy and higher processing load can be seen as more difficult.
The relationship between LC and PD will also be examined with NLP methods and techniques based on machine learning algorithms which will be applied to:
- existing corpora and resources, including results of large-scale assessments, to provide additional evidence for identifying features contributing to texts’ linguistic complexity;
- the collected production and comprehension data in order to focus on the interactions among different sources of LC and their effects on PD.