next up previous contents
Next: Towards convergence Up: Transcription and representation needs Previous: The speech community

Differences

 

The main differences in the approach to corpora containing spoken materials between the corpus linguistics community and the speech community that we have reviewed so far can be summarized in the following table:

Corpus linguistics Speech research
Materials Unprepared, unelicited speech Controlled, elicited speech
Scope Discourse, dialogue Utterance
Recordings Natural environment Controlled environment
Transcription Orthographic enriched (transcription) Phonetic and orthographic
aligned with the speech signal
(labelling)
Oriented Symbolic, categorical Speech signal, temporal
towards representation representation

A discussion of other differences between collections of written and spoken data can be found in the chapter devoted to corpus design in the EAGLES Handbook on Spoken Language Systems (EAGLES Spoken Language Working Group, 1995). Seven main differences are outlined there, having to do with the following aspects:

Biber (1988) and Halliday (1989) contain a more in-depth discussion of differences between speaking and writing from a linguistic perspective.

As discussed in the next section, there has been in very recent times a tendency towards integrating the needs of both communities, especially because the notion of speech database used in speech research has been gradually enlarged to encompass large collections of more natural data that are characteristic of work in corpus linguistics. However, one should not forget the differences due to the historical development of both fields that have led to emphasis on elicited spoken language in the speech research community and to emphasis on unelicited speech in corpus linguistics (Sinclair, 1993:68, 1994, 1996).



next up previous contents
Next: Towards convergence Up: Transcription and representation needs Previous: The speech community