Recommendation for a minimal set of encoding for spoken texts


In section 2.3 transcription and representation practices for spoken texts are reviewed, paying special attention to the NERC and TEI proposals. A survey of events represented and encoded in spoken texts (2.3.1) shows that an important number of phenomena can be of interest to different types of research. However, it seems necessary to consider a minimal set of events to be encoded according to the TEI-compliant Corpus Encoding Standard (CES) proposed for EAGLES (Ide, 1996). The present document will only be concerned with the events themselves, and the encoding of the International Phonetic Alphabet, of the transcription, and of the linguistic annotation of speech will be presented as part of CES. Proposals for the encoding of spoken texts within the TEI initiative can also be found in Johansson (1995a, b).

As a starting point, it should be noted that there are important differences between the transcription of read text - when the original written source is available - and the transcription of spontaneous speech. These differences are reviewed in detail in the EAGLES Handbook on Spoken Language Systems (EAGLES Spoken Language Working Group, 1995) and can be summarized in the following points:

Similar problems in the transcription of speech are mentioned by Johansson (1995b), who still adds one more dimension, i.e., the fact that since speech is generally addressed to a limited audience in a private setting, an adequate knowledge of the context and the situation is needed for a correct understanding.

Despite the difficulties involved in the transcription of unprepared speech, it should be possible to define a minimal common set of events to be encoded in the transcription of different types of spoken texts.

In section 2.4 the structural elements considered in the TEI Guidelines have been defined; they are listed again here for the reader's convenience:

The EAGLES Handbook on Spoken Language Systems (EAGLES Spoken Language Working Group, 1995) considers a set of non-linguistic phenomena that should be annotated when transcribing a speech corpus:

A comparison between these recommendations shows that there are elements which are common to both proposals, and therefore, they could possibly be part of the minimal set of elements to be encoded. These elements are the following:

Vocal semi-lexical events

Vocal non-lexical events

Non-vocalised non-communicative events

Note that the first two categories correspond to those subsumed under the tag <vocal> in the TEI, while the third corresponds to <event>.

The transcription of spoken interactions where more than one speaker is involved also requires the consideration of the following elements:

Speaker identity

Speaking turns, indicating a change of speaker

Simultaneous speech or overlapping

A third group of elements to be transcribed is related to the performance of the speaker. The convenience to include them in transcriptions is discussed in the EAGLES Handbook on Spoken Language Systems, where three different types of phenomena are identified:

Omissions in read text


Word fragments

Moreover, the encoding of spoken texts should contain a documentation of the difficulties encountered during the transcription process. The NERC proposals mention `guessed' and `unintellible fragments', while the SpeechDat conventions include a notational device for partially or totally unintelligible words. It seems also adequate to provide means for the notation of the uncertainties of the transcriber:

Unintelligible fragments

Finally, the encoding of utterances - defined as a strecht of speech usually preceded and followed by a pause or by a change of speaker - should be considered. We have already recommended the marking of changes of the speaker, and in section 5.2.2 devoted to prosody it is also proposed that pauses should be part of the elements to be encoded. This implies that utterances are necessarily encoded, since they are related to these elements.

An important point which has to be considered is the usability of the TEI recommendations from the point of view of the transcriber. Sinclair (1995) and Chafe (1995) discuss this issue, which is also mentioned by the EAGLES Spoken Language Working Group. As a general rule, a balance between the advantages offered by the TEI, the aims of the corpus and the demands imposed on the transcriber should be sought. The distinction put forward by Sinclair (1995:107) between conformity and compatibility with TEI is useful in clarifying the debate. In fact, the need to develop conversion software between a user-friendly system of transcription and the TEI encoding scheme was one of the recommendations arising from the EAGLES Workshop on `Issues in Corpus Work' organized by the Text Corpora Working Group in Madrid in January 1996.

