The IEEE International Workshop & Panel on Digital Language Resources for Scholars, was held on Thursday 10th June 2021 within IEEE CiSt'20, the 6th IEEE Congress on Information Science & Technology (05-12/06/2021).
Originally planned in Agadir (Morocco) but held in a virtual form due to the COVID-19 pandemic, the event was organized in the context of the collaboration that has bound the University Sidi Mohamed Ben Abdellah (USMBA), the IEEE CiSt Congress in Morocco and the Institute for Computational Linguistics "A. Zampolli" of the National Research Councl of Italy (ILC-CNR) since 2014. Its scope was to extend the Common Language Resources and Technology Infrastructure (CLARIN), classified as ESFRI Landmarks SSH RI, to the University Centres in Morocco, federate and share existing and new data, expand collaborations and leverage resources for collaborative research. The event intended therefore to open up a discussion on how to make Language Resources produced in Morocco more visible and accessible to a broader research community and on how the experience and resources for the setting up of various CLARIN data and competence centres could be beneficial to this purpose.
Although CLARIN is a European research infrastructure, it is not limited to European languages or resources. Non-European languages are made available by several European CLARIN centres and the South African Centre for Digital Language Resources (SADiLaR) recently joined CLARIN. CLARIN has also partnerships with centres in the USA and its deposit, metadata and single sign-on framework can represent a viable solution for anyone wishing to easily set up a repository of Language Resources. Language Resources for Arabic and its varieties can be found via the CLARIN Virtual Language Observatory (VLO), a meta-catalogue which harvests all metadata from CLARIN centres and makes them searchable from a single access point. These metadata can be relating to oral recordings, written corpora or lexicons. However, a lot of important corpora and lexical resources is currently not represented. Moreover, the CLARIN Language Resources Switchboard, a tool that helps to find Web applications for Natural Language Processing (NLP), currently lacks any NLP tool for Arabic.
The event was divided into short presentations aimed at providing an overview of CLARIN ERIC and its various aspects, showing in particular how its technical-scientific infrastructure is compliant with the internationally recognized FAIR principles, which recommend that data is Findable, Interoperable, Accessible and Reusable. Moreover, examples of resources and tools from various national consortia - notably from CLARIN-IT, the Italian node of CLARIN, of which ILC-CNR is the Executing Institution – as well as user involvement activities and currently ongoing projects (such as ParlaMint) were presented. The presentations were followed by a panel discussion.
Simonetta Montemagni (ILC-CNR Director), Monica Monachini (ILC-CNR, CLARIN-IT National Coordinator), Ouafae Nahli (ILC-CNR), Maha El Biadi (USMBA) and Francesca Frontini (ILC-CNR, CLARIN ERIC Board of Directors) participated in the event as Panelists.