The relationship between language and the Internet, and specifically the web of documents and the web of data, as well as the emerging Internet of things, is a growing area of research, development, innovation and policy interest. The emerging picture is one where language profoundly affects a person’s experience of the Internet by determining the amount of accessible information and the range of services that can be available, e.g. by shaping the results of a search engine, and the amount of everyday tasks (such as buying a ticket, reviewing opinions about hotel and restaurants, purchasing books or other goods, etc.) that can be carried out virtually.
The extent to which a language can be used over the Internet or in the Web not only affects a person’s experience and choice of opportunities; it also affects the language itself. If a language is poorly or not sufficiently supported to be used over digital devices, for instance if the keyboard of the device is not equipped with the characters and diacritics necessary to write in the language, or if there is no spell checker for a language, then its usability becomes severely affected, and it might never be used online. The language could become “digitally endangered”, and its value and profile could be lessened, especially in the eyes of new generations. On the other hand, concerted efforts to develop a language technologically could contribute to the digital ascent and digital vitality of a language, and therefore to digital language diversity. Crosslingual technologies such as machine translation facilitate communication with people who do not speak a language and therefore sustain the use of that language by the native speakers.
These considerations call for a closer examination of a number of related issues.
First, the issue of “digital language diversity”: the Internet appears to be far from linguistically diverse. With a handful of languages dominating the Web, there is a linguistic divide that parallels and reinforces the digital divide.
The amount of information and services that are available in digitally less widely used languages is reduced, thus creating inequality at several different levels:
- unequal access to technological development;
- inequality of information and access to services;
- inequality of linguistic rights and digital opportunities for all languages and all citizens;
- unequal digital dignity, i.e. uneven perception of a language importance as a function of its presence on digital media;
- unequal opportunities for digital language survival;
- cultural bias of information in the Web.
Second, it is important to reflect on the conditions that make it possible for a language to be used over digital devices, and about what can be done in order to grant this possibility to languages other than the so-called “major” ones. Despite its increasing penetration in daily applications, language technology is still under development for these major languages, and with the current pace of technological development, there is a serious risk that some languages will be left wanting in terms of advanced technological solutions such as smart personal assistants, adaptive interfaces, or speech-to-speech translations. We refer to such languages as under-resourced. The notion of digital language diversity may therefore be interpreted as a digital universe that allows the comprehensive use of as many languages as possible.