next up previous contents
Next: Conclusion Up: Layers of annotation Previous: Information about the rank


Information about spoken language non-fluency phenomena: layer (h)

Spoken language corpora show a range of phenomena that do not normally occur in written language corpora, such as blends, false starts, reiterations and filled pauses. In syntactic annotation, it has to be decided whether to include such phenomena in a parse tree, and, if so, how. To the best of our knowledge this has been implemented only in the Polytechnic of Wales (POW) corpus (Souter 1989) and in the International Corpus of English. Sampson (1995: Ch.6) mentions its possible inclusion in a future release of the SUSANNE Corpus, and on-going work at Lancaster on the British National Corpus will include incorporation of non-fluency phenomena in the skeleton parsing of both written and spoken data. There is also a proposal to undertake an analysis of such phenomena in the parsing of the British component of the International Corpus of English (Aarts 1992; Greenbaum 1992).