Towards convergence

Next: Existing transcription and representation Up: Transcription and representation needs Previous: Differences

Towards convergence

A relatively recent development in speech research has been the study of speech obtained in a more natural situation -- although a controlled recording environment has to be kept for the purposes of a detailed acoustic analysis. The first type of speech has been labeled as spontaneous speech to differentiate it from the so-called laboratory speech -- that is, speech material consisting of short read sentences prepared in advanced by the researcher and recorded in laboratory conditions (Lindblom, 1987). To put it in Teubert's words:

The speech community has commenced to express their interest in large spoken language corpora. Even general purpose corpora of impromptu, unrehearsed, unscripted, non elicited informal conversations now seem to arouse some interest in speech research as they can be used as test-beds for speech recognition systems. (Teubert, 1993:4)

This has triggered research in the field known as speaking styles (Eskénazi, 1993) which in some aspects is closely linked to research in pragmatics and in sociolinguistics as well as to typologies of texts developed in corpus linguistics (see Sinclair & Ball, 1995 for a discussion of style and text typology). Some of the speaking styles that are of current interest to speech research are identified by Moore (1991): read speech; spontaneous speech arising from directed monologue; spontaneous speech arising from a dialogue between human interlocutors; spontaneous speech arising from simulated human-computer interaction; spontaneous speech arising from real human-computer interaction; material that reflects the influence of physiological or environmental factors on the voice of the talker; speech collected from talkers representing a large range of age and accent groups; and speech collected from different microphones and microphone arrangements. Although some of the styles mentioned by Moore are still specific to the interests of speech technology -- e.g. samples of human-computer interaction -- dialogues and monologues have been collected in linguistic work with different aims. It seems, then, that both the materials and the scope of spoken corpora in speech research are being enlarged and tend to be closer to the interests of corpus linguistics.

As far as representation is concerned, Moore (1991:3) points out that:

For many purposes (especially in speech technology) it has become clear that speech data can be very useful if accompanied by machine-readable annotations consisting, at the very least, of an orthographic transcription with paragraph or phrase level pointers into the acoustic data.

The interest in orthographic transcription can be explained by the existence of previously mentioned methods that allow the (semi-)automatic segmentation and labelling of the speech wave and the temporal alignment of the signal with the phonetic and the orthographic representation; thus, the signal in a large corpus can be conveniently accessed through the orthographic transcription. Moreover, speech recognition systems need language models to train the grammars included in the system; since large corpora of orthographically transcribed speech are required to obtain these models, this is another reason for the speech community to be interested in the orthographic representation (Moore, 1991:3; Atwell, 1996).

On the other hand, the value for the corpus linguistics community of the technical advances in the field of digital speech processing and in (semi-)automatic segmentation and labelling of the speech wave, together with the possibility to align it with the orthographic representation has recently been acknowledged. It is important to mention here that a fundamental recommendation issuing from the Network of European Reference Corpora (NERC) is that the digitized speech signal should be included as a component of a corpus (Sinclair, 1993:65-70).

In summary, it can be noticed that at the same time that speech databases are becoming larger and are including more natural data together with their orthographic representation linked to the speech signal, corpus linguistics can take advantage of the technology developed in speech research to automate the process of easily storing and accessing large quantities of spoken data and to obtain a categorical representation in terms of a phonetic transcription or an orthographic representation. There is then a potential for sharing data between both communities that is

entirely dependent on the agreement of satisfactory data format, transcription and annotation interchange standards. (Moore, 1991:3).

Next: Existing transcription and representation Up: Transcription and representation needs Previous: Differences