Sentence (S)

The maximal, syntactically independent, segments into which a text is subdivided, for parsing purposes, are normally considered to be sentences. In a written text, they are typically (though by no means invariably) delimited by an initial capital lettergif and a final full stop (`.') or other terminal punctuation. It is convenient to accept this primary orthographic definition of `sentence' for the purposes of syntactic annotation. However, a sentence, so defined, may be either a full sentence:

(9)  [S This is a sentence. S]
or a `grammatically incomplete' one:
(10)  [S Well done. S]
The same applies to sentences included within other sentences, as in:

(11)  [S [S ``Well done'', S] she said. S] }
``Well done'' in 11 is labelled as a sentence, since it clearly has an independent syntactic status equivalent to those of 9, even though it is included in another sentence. This inclusion of one independent sentence within another is found both with reported speech and elsewhere. Phenomena such as those illustrated in 10 and gif are by no means exceptional in text corpora.

In transcriptions of spoken discourse, there is no simple answer to the question ``What is a sentence?''. Some transcriptions, based on standard orthography, yield de facto sentences in the form of units beginning with a capital letter and closing with a terminal punctuation mark. For these, there is no problem in recognising the primary sentential segments and delimiting them by [S ... S], even though these segments frequently lack the canonical structure of a complete written sentence. Moreover, even in other transcriptions, where the standard orthographic practices of sentence delimitation are avoided, it is possible to identify `primary segments' analogous to the written sentence, viz. the primary units into which the transcribed discourse is divided for parsing purposes. For spoken as well as written language, then, the [S] unit may be retained, although it may be interpreted differently, and some other term, such as `primary segment', may be preferred to `sentence'.

We conclude by recommending, for the syntactic annotation of any text (including a transcription of spoken language), an exhaustive division of the text into units labelled [S ... S]gif.

