Motivation and Topics
Sustaining digital language diversity is a truly inter-disciplinary endeavor, which crucially involves the language resources community. Language resources and technologies (LRTs) are a cornerstone of digital language diversity. Availability of good-quality language resources and tools, effective techniques for rapid development of basic language resources and for adaptation of natural language processing (NLP) tools to language varieties, standards, guidelines and best practices for the development of multilingual lexicons and terminologies, but also infrastructures and environments supporting easy sharing of resources, are among the essential components of any truly multilingual society.
However, in view of the limited strategic and economic interest in many languages other than the major ones and the pervasiveness of the data-driven paradigm in NLP, it can hardly be expected that language technologies and applications for under-resourced languages will be provided in the future without a strong commitment and concrete actions on the side of all stakeholders, including under-resourced language speakers, language practitioners, academics and software developers.
It is only by joining forces among these that an Alliance for Digital Language Diversity can be established for the purpose of empowering speakers of under-resourced languages with the knowledge and abilities to create and share content on digital devices. This will increase the amount of content available for such languages, and therefore pave the way to software developers for providing state-of-the-art products and services allowing the use of these languages over digital devices.
Following the successful edition of the CCURL Workshop at LREC 2014, that saw the participation of 50 people with a program of 17 presentations, we would like to establish a series with the proposal for this yearís workshop. The present proposal is made in the true spirit of Collaboration and Computing for Under-Resourced Languages (CCURL) as we broaden the scope of the workshop by exploring a wide variety of aspects -from infrastructural to institutional- connected to enhancing digital language diversity.
In this workshop, we solicit research contributions from academic and industrial researchers, digital language technology developers, researchers and professionals in language learning and domain-specific digital language applications, policy makers, language activists, and speakersí associations broadly addressing the research, development and innovation required to enlarge the opportunities for an increasing number of languages to be used and usable over digital devices.
Relevant topics, pertaining to digital language diversity, include, but are not limited to:
- digital infrastructure, including LRTs: gaps, quantity, quality, availability, suitability, reusability, sustainability, maintainability, etc.;
- LRT research and development: experiences in the development of LRTs for under-resourced languages; use and usability of minority languages in the social media;
- domain-specific resources and applications for and in under-resourced languages: education (e.g. e-books), entertainment, publishing, government, commerce, finance, etc.;
- policies, guidelines and best practices for the development of digital language diversity;
- advocacy and digital language planning: speakers of under-resourced languages often not only lack the infrastructural requirements for using their languages over digital media and devices, but also the necessary assertiveness to use their language instead of or in addition to a dominant one (be it the State language, or another major one);
- digital cultural diversity through digital language diversity;
- new methods and modalities towards digital language diversity, including crowdsourcing, open data and free/open-source and other cooperative approaches, as well as semantic web technologies;
- significance and implications of digital language diversity for the Internet of things.