next up previous contents
Next: Evaluation of the Text Up: Existing transcription and representation Previous: Transcription conventions adopted by

The Text Encoding Initiative (TEI) recommendations

 

A chapter of the TEI Guidelines is devoted to the transcription of spoken texts (Sperberg-McQueen & Burnard (Eds.), 1994). It describes the basic structure of the TEI representation of a spoken text -- header, text and divisions -- and defines ways to signal basic structural elements: contextual information, temporal information, utterances, pauses, semi-lexical and non-lexical vocalized elements, kinesic events, other types of communicative events and text presented in written form to the speaker. Guidelines on segmentation and alignment are also provided, together with recommendations for the transcription of speaker overlaps, word forms, prosody and paralinguistic features -- tempo, loudness, pitch range, tension, rhythm and voice quality -- and disfluencies. For the representation of phonetic information, use of the International Phonetic Alphabet (IPA) is recommended.

Johansson (1995a, b) provides a clear overview and discussion of the TEI conventions for the encoding of spoken texts. More information on the TEI can be found at URL http://etext.virginia.edu/TEI.html;http:// www-tei.uic.edu/orgs/tei ; http://info.ox.ac.uk/archive/teilite.

An example of a TEI-conformant transcription of a Spanish spoken corpus can be found in Marcos Marín et al. (1993) included on the CD-ROM that has been produced by the ECI (European Corpus Initiative) (see more information on this initiative at URL http://www.cogsci.ed.ac.uk/elsnet/eci.html). The British National Corpus (Crowdy, 1995) is a major initiative using TEI-conformant transcriptions for spoken language. More information on this corpus is found at URL http://sable.ox.ac.uk/bnc/index.html or at URL http://info. ox.ac.uk/bnc/.