A lexical corpus-based model of Contemporary Written Arabic
Type of Project: 
Funding Body: 
Ministero dell'Istruzione, dell'Università e della Ricerca
Funding Programme: 
PRIN 2020 – SH4
Grant Agreement: 
B57G22000800006 (Prot. 20204EJYRX)
Start Date: 
End Date: 
Role of ILC: 
Project Coordinator: 
Università degli Studi ROMA TRE (ROMA TRE)
Project Chair: 
Giuliano Lancioni (ROMA TRE)
ILC Research Unit Chair: 

Arabic has been traditionally described in terms of diglossia: two distinct levels of the same language - an upper, written, formal one (Classical/Standard Arabic) and a lower, oral, informal one (different varieties of spoken Arabic, the so-called Arabic dialects) - are mixed by speakers through a 'code-switching' or a 'code-mixing'.

The project aims to create a lexicographic resource for Contemporary Written Arabic (CWA) that takes into account materials whose features are found in real-world Arabic written texts, regardless of a preliminary classification on the basis of their linguistic nature.

A new theoretical approach will therefore be provided, which will overcome the traditional description of the Arabic linguistic system in terms of diglossia and will interpret Arabic as a linguistic complex.

A final test model will be produced, which will aim to be the first large-scale validated CWA resource providing objective and substantial data to:
   (i) test competing theories on the linguistic status of the Arabic language;
   (ii) prove the extensibility of the model to the complete coverage of CWA.

The new approach of the project may allow Arabic to be analyzed in the same way as other languages have been for some decades within the corpus linguistics tradition, namely as a language whose lexicon (and whose grammar) can be described in a variety-neutral way on the basis of the analysis of a representative corpus of the language that is defined according to a series of external, objective criteria (such as timespan, genres, areas).

The resulting lexical resource design will encourage new approaches in terms of Arabic language teaching and learning, overcoming the long-standing issue of diglossia.

Given the objective corpus-based CWA description, in the long term the foreseen lexical resource may play a crucial role in fostering social dialogue with and within Arabic-speaking minorities in Italy.

In fact, the outcome of the project may have a positive social impact on the inclusion of Arabic-speaking communities, since it will contribute to produce teaching and learning materials that will be closer to the actual linguistic reality of native Arabic speakers, also in connection with the existing (but still very limited) presence of Arabic among languages taught in Italian high schools.