In a classical book on English intonation, Crystal (1969) discusses the principles that should guide a prosodic transcription system. According to him these are the following:
On the overall, most of the systems examined in the previous section fulfill the conditions proposed by Crystal. However, different aims of the transcription may require different systems, and for this reason specific recommendations might be necessary according to the type of research for which prosodic transcription is needed.
The first issue to be discussed in this section is the prosodic events to be encoded in a spoken text. The following elements seem to be common to many prosodic transcription systems:
The Text Encoding Initiative (see 2.3.3) proposes the encoding of three elements which can be related to prosody:
We have seen, moreover, that stress and pitch patterns can be represented. The symbols used for these purpose are punctuation marks in the example provided by Sperberg-McQueen & Burnard (Eds.) (1994): <.> for a low-fall intonation, <,> for a fall-rise, <?> for a low rise, <!> for a rise fall, and <:> for a lengthened syllable.
The proposal adopted by the Network of European Reference Corpora (NERC) (see 2.5.2) includes prosodic information in levels III and IV:
Taken together, TEI and NERC allow for the representation of global prosodic phenomena. When a more detailed representation is needed the NERC report suggests the use of SAMPROSA (SAM Prosodic Alphabet) (Teubert, 1993; Sinclair, 1993).
In terms of prosody encoding, a proposed recommendations would be to represent, at least, the two TEI elements Utterance and Pause (see 4). The representation of other prosodic phenomena such as those mentioned in NERC levels III and IV seems to be more adequately cared for by a transcription system such as SAMPROSA. Among these phenomena, at least tone unit or tone group boundaries and stress (or tonic syllables) could be included in a transcription containing basic prosodic information.
As far as the transcription system to be used is concerned, it is worth quoting the opinion of the EAGLES Spoken Language Working Group:
It is reasonable to assume nowadays that a prosodic transcriber will have access to at least the waveform and the Fo curve for the speech to be transcribed. In that case, the recommendation is to use either the ToBI or the IPO system (and the MARSEC system if a purely auditory transcription is being carried out. If the language to be transcribed is not English, and specially if the projected application of the prosodic transcription is in the field of speech technology, then it is probably best to use the IPO system if possible (i.e., if the basic ``grammar" of contours has already been researched for that language). However, these can only be provisional recommendations, as little work has been carried out in prosodic labelling. In this situation, it may be that a different system entirely will prove more appropriate to the given language, and it is not possible to make absolute recommendations.
More important than the choice of a particular system is the acknowledgement of the difficulties in providing recommendations in this area given the present state-of-the-art. Although ToBI is rapidly becoming a standard despite its orientation towards the transcription of English and the theoretical phonological assumptions underlying it, SAMPROSA offer the advantages of being accepted by NERC and of having been developed with both linguistics and speech technology needs in mind. However, the diversity of current proposals can be overcomed by developing mapping between systems in order to allow for conversions between them. This was one of the recommendations issued from the Madrid workshop `Issues in Corpus Work' organised by the Text Corpus Working Group in January 1996.
In terms of the dichotomies presented in the previous sections, it would be advisable to choose a multi-tiered, machine-readable and multilingual prosodic transcription system. If it can be applied automatically instead of relying on the judgement of the transcriber, this would be an important advantage in the labelling of large corpora.