The speech community

Next: Differences Up: Transcription and representation needs Previous: The corpus linguistics community

The speech community

Within the speech community, the emphasis so far has been on speech databases (Carré, 1992) rather than on spoken corpora in the sense used in the previous section. This is due to the need to obtain controlled speech data for basic research aimed at modelling and describing the articulatory and acoustic properties of speech or, in the field of speech technology, to derive data for speech synthesis or to build up material for training and testing speech recognition, speaker recognition/verification or spoken language dialogue systems (see the chapter on corpus design in the EAGLES Handbook on Spoken Language Systems (EAGLES Spoken Language Working Group, 1995) for a review of the applications of spoken language corpora and Lamel - Cole (1996) for a survey of recent activities in the area of speech corpora).

Moore (1991) offers a typology of the types of recorded speech usually encountered in speech research:

Analytic-diagnostic material designed to get basic information on the articulatory and acoustic features of speech (e.g. lists of consonant-vowel combinations);
General purpose material for speech technology applications (e.g. vocabularies); and
Task-specific materials pertaining to different discourse domains and oriented towards the needs of applications in man-machine communication (e.g. train timetable inquiries).

A more detailed account of the linguistic content of these type of corpora is provided in the chapter on corpus design in the EAGLES Handbook on Spoken Language Systems (EAGLES Spoken Language Working Group, 1995); the following types of material are distinguished:

Read aloud isolated items: phonemes, words, sentences or text fragments;
Semi-spontaneous speech;
Spontaneous speech about a predetermined subject;
Simulated person-machine interactions;
Spontaneous speech.

The central issue here is the speech signal itself, and its symbolic representation is usually made by means of a phonetic alphabet -- the IPA or a computer-readable equivalent being the commonly agreed international system (see 5.1.1) -- allowing the phonetic modifications of words when they are spoken in context to be represented. The speech wave is first segmented into units that can be related to phonetic symbols and labelled to temporally synchronize a symbol representing a set of phonetic categories with a given part of the signal -- a process known as alignment; the phonetic representation can be also related with the orthographic representation and thus aligned with the speech signal. Although this process used to be done manually by expert phoneticians, it can be now performed (semi-)automatically, depending on the type of speech; however, manual verification is still needed to achieve the required accuracy of the result.

Corpora with the characteristics described in this section are sometimes called speech corpora (Sinclair, 1994, 1996).

Next: Differences Up: Transcription and representation needs Previous: The corpus linguistics community