A lexical corpus-based model of Contemporary Written Arabic

Type of project: National  |  Start date: 01/06/2022  |  End date: 31/05/2025

Arabic has been traditionally described in terms of diglossia: two distinct levels of the same language, an upper, written, formal one (Classical/Standard Arabic) and a lower, oral, informal one (different varieties of spoken Arabic, so-called Arabic dialects) are mixed by speakers through code-switching or code-mixing.

The project aims to create a lexicographic resource for Contemporary Written Arabic (CWA), which will take into account materials whose features are found in real-world Arabic written texts, regardless of a preliminary classification on the basis of their linguistic nature. Thus, a novel theoretic approach is provided that overcomes the traditional description of the Arabic linguistic system in terms of diglossia, and interpretes Arabic as one linguistic complex. A final proof-of-concept model will be produced, which aims to be the first large-scale, validated resource on CWA that provides objective and substantial data to test competing theories on the linguistic status of Arabic language, and to prove the model extensibility to a fuller coverage of CWA.

The novel approach would allow Arabic to be analysed in the same way as other languages have been for some decades within the corpus linguistics tradition, namely as a language whose lexicon (and grammar) can be described variety-neutrally upon analysis of a representative corpus of the language which is defined according to a series of external, objective criteria (such as timespan, genres, areas).

The resulting design of the lexical resource will encourage new approaches in terms of Arabic language teaching and learning, overcoming the long-standing diglossia issue.


Funding programme:
PRIN 2020 – SH4

Funding body:

Grant agreement:
B57G22000800006 (Prot. 20204EJYRX)


CNR-ILC role:

Project coordinator:
Roma Tre University

Paola Baroni
Andrea Bellandi
Giulia Benotto
Elisa Gugliotta
Nadia Khlif