Summary of proposals and recommendations


We have tried to explore in this document some ways to achieve compatibility between NERC and TEI proposals, developed within the corpus linguistics community, and the practices usually followed by the speech community as documented in the EAGLES Handbook on Spoken Language Systems and in other sources reviewed.

Recommendations suggested here are based on surveys of current practice and tend to be based in common elements found in different traditions. For this reason, they are of a very general nature and have to be further developed to cover more specific needs.

For the encoding of spoken texts, the following set of elements to be encoded is suggested (see 2.5.3):

Vocal semi-lexical events
Vocal non-lexical events
Non-vocalised non-communicative events
Speaker identity
Speaking turns, indicating a change of speaker
Simultaneous speech or overlapping
Omissions in read text
Word fragments
Unintelligible fragments

The need to develop conversion software between a user-friendly system of transcription and the TEI encoding scheme is also acknowledged.

A proposal for transcription and labelling has been put forward, consisting in three levels (see 2.5.3):

S1 -- Orthographic representation of the text.
S2 -- Phonemic representation of words in citation form: that is, the forms in which words are pronounced in isolation.
S3 -- Phonetic transcription reflecting a discrete symbolic representation of the perceived actual realization of the utterance.

Of course, all these levels have to be linked to the speech signal itself, and the use of automatic alignment techniques to do so is encouraged.

As far as the orthographic representation is concerned, the following recommendations can be suggested (see 4.1):

Use conventional spelling forms as they appear in a standard dictionary. This also applies to contractions, reduced word forms, apostrophes, dialect forms, interjections and vocalised semi-lexical events.
If more than one orthographic form is possible or if non-standard spellings or spelling variations are necessary, maintain a lexicon of the spelling forms used in the transcription
Represent numbers, abbreviations, acronyms and spelled words in full orthographic form as pronounced by the speaker

It has to be noted that punctuation is still one aspect which would need a more in-depth discussion.

The rationale behind these recommendations is the possibility to create an automatic link between the orthographic transcription and the phonemic representation in level S2.

Concerning the choice of a segmental transcription system (see 5.1.3), the IPA (International Phonetic Alphabet) is to be recommended. Whenever a machine-readable equivalent is necessary, SAMPA (SAM Phonetic Alphabet) is recommended for phonemic transcriptions such as those proposed at level S2, and the X-SAMPA extension is to be considered for a phonetic transcription such as the one proposed at level S3.

The prosodic elements to be encoded are discussed in 5.2.2, where it is suggested to represent, at least, the two TEI elements Utterance and Pause.

The choice of a prosodic transcription system is also discussed in 5.2.2. ToBI (Tone and Break Indices) and SAMPROSA (SAM Prosodic Alphabet) - complemented by the X-SAMPA extension - are considered standard machine-readable systems, and the need to develop mappings between different systems is acknowledged. In general, the use of a multi-tiered, machine-readable and multilingual prosodic transcription system is recommended.

Some recommendations for data acquisition are also provided in 3, and can be summarised as follows:

If acceptable in the recording environment, and for optimal acoustical quality, use headset microphones.
Use digital recording devices. If direct recording into a computer is not possible, DAT (Digital Audio Tape) is recommended.

It is clear that these recommendations can only be provisional in the sense that they have to be validated and refined by applying them to different types of spoken materials, although most of them are based on current practice in different scientific communities. However, they are intended to be a first step towards a common set of working conventions which could improve the reusability of speech and spoken language resources.

