Computer phonetic alphabets

Next: Proposals for the transcription Up: Segmental level Previous: Transcription systems

Computer phonetic alphabets

The increasing need for electronic exchange of texts containing phonetic transcriptions has led to the computer coding of the International Phonetic Alphabet (Esling, 1988, 1990; Esling & Gaylord, 1993; IPA, 1989). A numerical equivalent for each of the IPA symbols -- IPA number -- has been defined and translation tables can be developed to relate ASCII codings to IPA numbers. These mappings are part of the CRIL (Computer Representation of Individual Languages) conventions, also discussed in the chapter on corpus representation of the EAGLES Handbook on Spoken Language Systems (EAGLES Spoken Language Working Group, 1995). Worldbet, developed by Hieronymus (1994) is another proposal for the ASCII coding of phonetic symbols and has been used in the 22 Language Telephone Speech Corpus developed by the Oregon Graduate Institute.

Other systems have been developed for specific goals. For example, CHILDES (Child Language Data Exchange System), a project aimed at collecting samples of children's language, makes use of PHONASCII (Allen, 1988), a coding system including a phonemic -- UNIBET -- and a phonetic alphabet, allowing narrow and broad transcriptions (see more information on CHILDES at URL http://poppy.psy.cmu.edu/childes/).

Within the ESPRIT project Linguistic analysis of European languages a Computer Phonetic Alphabet (CPA) was developed for seven European languages, based on the IPA (Kluger-Kruse, 1987).

However, the main effort in the provision of a computer-readable transcription system that covers the phonemic inventories of most European languages has been made within the ESPRIT SAM Speech Assessment Methodology projects. The SAM Phonetic Alphabet (SAMPA) defines a set of ASCII codings corresponding to the IPA symbols necessary for the phonemic transcription of all major European languages included in the EUROM corpus (Chan et al., 1995; more information is available at URL http: //www.phon.ucl.ac.uk/resource/eurom.html) and is being successfully used in other European and national projects. SAMPA is described in Wells (1987, 1989), Wells et al.(1992), and is also fully discussed in an appendix to the EAGLES Handbook on Spoken Language Systems (EAGLES Spoken Language Working Group 1995); a presentation of the system and the SAMPA adaptations to Danish, Dutch, English, French, German, Greek, Italian, Norwegian, Portuguese, Spanish and Swedish with ASCII and IPA equivalents can be equally found at URL http://www.phon.ucl.ac.uk/home/sampa/home.htm.

SAMPA, like the IPA, is in principle based on a phonemic principle, representing only sounds which serve to distinguish word meanings in a given language; this is also in accordance with the principle of phonotypical transcriptions discussed in 2.5.1. However, phonetic notation of certain allophones is also allowed with the current set of symbols although it is not encouraged for methodological reasons.

Wells (1995) has recently proposed an extension of SAMPA known as X-SAMPA (described at URL http://www.phon.ucl.ac.uk/home/sampa/home.htm/x-sampa.htm). It consists in a keyboard-compatible coding for the entire set of IPA symbols, including diacritics and tone marks. The system is specially intended for the electronic transmission of materials transcribed using the International Phonetic Alphabet.

In the context of phonetic transcription systems applied to speech technology it is worth mentioning the standards adopted within the ONOMASTICA project (Schmidt et al., 1993) for the transcription of proper names.

Next: Proposals for the transcription Up: Segmental level Previous: Transcription systems