Proposals for the transcription of the suprasegmental level

Next: Summary of proposals and Up: Suprasegmental level Previous: Transcription systems

Proposals for the transcription of the suprasegmental level

In a classical book on English intonation, Crystal (1969) discusses the principles that should guide a prosodic transcription system. According to him these are the following:

- accuracy;
- consistency;
- be as automatically applicable as possible;
- use the minimum of symbols;
- establish degrees of complexity of symbols to reflect the different significance attached to the data; and
- be broad, covering only those aspects which are linguistically significant.

On the overall, most of the systems examined in the previous section fulfill the conditions proposed by Crystal. However, different aims of the transcription may require different systems, and for this reason specific recommendations might be necessary according to the type of research for which prosodic transcription is needed.

The first issue to be discussed in this section is the prosodic events to be encoded in a spoken text. The following elements seem to be common to many prosodic transcription systems:

Prosodic boundaries and prosodic units
Tone or pitch level, terminal and non-terminal
Pitch movements, pitch direction or pitch contour, both local and global
Accent, at word or phrase level
Lengthening
Pauses

The Text Encoding Initiative (see 2.3.3) proposes the encoding of three elements which can be related to prosody:

Utterance, defined as a strecht of speech usually preceded and followed by a pause or by a change of speaker
Pause
Shift, that might be used to signal changes in paralinguistic features - voice quality, loudness, pitch range and speech rate.

We have seen, moreover, that stress and pitch patterns can be represented. The symbols used for these purpose are punctuation marks in the example provided by Sperberg-McQueen & Burnard (Eds.) (1994): <.> for a low-fall intonation, <,> for a fall-rise, <?> for a low rise, <!> for a rise fall, and <:> for a lengthened syllable.

The proposal adopted by the Network of European Reference Corpora (NERC) (see 2.5.2) includes prosodic information in levels III and IV:

Tone unit boundaries (Level III)
Tonic syllables (Level III)
Tones (Level IV)
Head syllables (Level IV)

Taken together, TEI and NERC allow for the representation of global prosodic phenomena. When a more detailed representation is needed the NERC report suggests the use of SAMPROSA (SAM Prosodic Alphabet) (Teubert, 1993; Sinclair, 1993).

In terms of prosody encoding, a proposed recommendations would be to represent, at least, the two TEI elements Utterance and Pause (see 4). The representation of other prosodic phenomena such as those mentioned in NERC levels III and IV seems to be more adequately cared for by a transcription system such as SAMPROSA. Among these phenomena, at least tone unit or tone group boundaries and stress (or tonic syllables) could be included in a transcription containing basic prosodic information.

As far as the transcription system to be used is concerned, it is worth quoting the opinion of the EAGLES Spoken Language Working Group:

It is reasonable to assume nowadays that a prosodic transcriber will have access to at least the waveform and the Fo curve for the speech to be transcribed. In that case, the recommendation is to use either the ToBI or the IPO system (and the MARSEC system if a purely auditory transcription is being carried out. If the language to be transcribed is not English, and specially if the projected application of the prosodic transcription is in the field of speech technology, then it is probably best to use the IPO system if possible (i.e., if the basic ``grammar" of contours has already been researched for that language). However, these can only be provisional recommendations, as little work has been carried out in prosodic labelling. In this situation, it may be that a different system entirely will prove more appropriate to the given language, and it is not possible to make absolute recommendations.

More important than the choice of a particular system is the acknowledgement of the difficulties in providing recommendations in this area given the present state-of-the-art. Although ToBI is rapidly becoming a standard despite its orientation towards the transcription of English and the theoretical phonological assumptions underlying it, SAMPROSA offer the advantages of being accepted by NERC and of having been developed with both linguistics and speech technology needs in mind. However, the diversity of current proposals can be overcomed by developing mapping between systems in order to allow for conversions between them. This was one of the recommendations issued from the Madrid workshop `Issues in Corpus Work' organised by the Text Corpus Working Group in January 1996.

In terms of the dichotomies presented in the previous sections, it would be advisable to choose a multi-tiered, machine-readable and multilingual prosodic transcription system. If it can be applied automatically instead of relying on the judgement of the transcriber, this would be an important advantage in the labelling of large corpora.

Next: Summary of proposals and Up: Suprasegmental level Previous: Transcription systems