next up previous contents
Next: Documentation and user information Up: Issues in practical application Previous: Punctuation

Recommendations

 

Bracketing of single-word constituents

A further issue that requires explicit handling in the documentation is the following. What principle is applied in deciding whether to bracket non-terminal constituents consisting of one word? Two extreme policies here would be (a) to leave unbracketed all single-word constituents, and on the other hand (b) to bracket all single-word constituents where they show their phrasal status by the possibility of adding modifers, or replacing them by a multi-word phrase.

There are difficulties with both these solutions. Solution (a) is a problem, for example, where coordination occurs between a single word and a multi-word constituent, as in 89:

(89)  [NP [NP John NP] and [NP his sister NP] NP]
where, according to guideline (a), John would be left unbracketed, and the equivalence of the two conjuncts would not be represented in the bracketing. Solution (b) might be felt unsatisfactory because it leads to a proliferation of bracketings, as in 90, where one or more modifiers precede a noun:

(90)  [NP [DETP many DETP] [ADJP recent ADJP] [ADJP unexpected ADJP] arrivals NP]
The phrase brackets delimiting the first three words here are meant to indicate the potential phrasal status of these words, because of the possibility (for example) of adding very before many.

Of these two guidelines, (b) is a problem only to the human reader. When software is used to display bracketed sentences as trees (or in any other format), the proliferation of bracketings is not an issue at all. Thus, rule (b) may be considered preferable, especially for interchange and retrieval purposes, since the filtering out of extra brackets is a much simpler task than the insertion of new ones. However, if a compromise is needed between these two extremes, a useful compromise may be to bracket single-word constituents only where they represent major constituents in the sentence, e.g. as Subject or Object, or where they are in coordination with other multi-word constituents, as in 90 above. Such conditions need to be made explicit in the documentation of the annotation scheme. (For an actual application, see Sampson 1995: 172f for an account of how single word constituents are handled in the SUSANNE corpus.)



next up previous contents
Next: Documentation and user information Up: Issues in practical application Previous: Punctuation