This phenomenon we define as an unresolved choice between alternative annotations. Again, there may be a number of reasons for ambiguous syntactic annotation, some motivated by practical considerations, and others by more theoretical linguistic considerations. Here we distinguish two major kinds of ambiguity.
A system ambiguity is an ambiguity which results from parser limitations. Automatic corpus parsers, which are likely to be increasingly employed for syntactic annotation on a large scale in the future, are at present far from achieving the `ideal' outcome of a single correct parse for each sentence. Where there is likelihood of parsing error, a preferable solution, for most purposes, will be for the parser to output more than one parse, leaving it to the human interpreter to decide which is a correct analysis. Thus, the Helsinki ENGCG parser leaves alternative analyses in the output, where disambiguation cannot be achieved. This is illustrated in table 10, where, due to the categorical ambiguity of addresses, two syntactic structures are possible: [NP this paper NP] [VP addresses ...] and [NP this [paper addresses] NP].
|"this" <*> DET CENTRAL DEM SG @DN>|
|"paper" N NOM SG @SUBJ @NN>|
|"address" <SVO> V PRES SG3 VFIN @+FMAINV|
|"address" N NOM PL @NPHR|
|"three" NUM CARD @QN>|
|"main" A ABS @AN>|
|"issue" N NOM PL @NPHR @OBJ|
Although in principle a manual posteditor could eliminate most system ambiguities, in practice such a postediting phase may be too expensive or too time-consuming, so that ambiguities will have to remain in the annotated corpus.
This second kind of ambiguity is the kind that cannot be resolved by a human posteditor -- because even when context is taken into account, the sentence cannot be disambiguated by a human interpreter. Such `genuine ambiguities' may result in a glaring difference of meaning, as in 80 and 81, where it is not clear whether Stephen Glanville is Agatha Christie's husband or not:
|(80)||Stephen Glanville, who died in 1956, was [NP an Egyptologist friend [PP of [NP [NP Agatha Christie NP] and [NP her husband NP] NP] PP] NP]|
|(81)||Stephen Glanville, who died in 1956, was [NP [NP an Egyptologist friend [PP of [NP Agatha Christie NP] PP] NP] and [NP her husband NP] NP]|
|(82)||They [VP let me [VP speak [ADVP now and then ADVP] VP] VP]|
|(83)||They [VP let me [VP speak VP] [ADVP now and then ADVP] VP]|
Our recommendation is, where feasible, to indicate ambiguities in the annotated corpus, and to specify by an annotation device whether the ambiguity is of the system type or the use type. The nature and function of the annotation device should be specified in the documentation to the annotation scheme.
Syntactic ambiguities may be of various types: in fact, all the layers of syntactic annotation listed in section 3 (bracketing of segments, labelling of segments, subcategorisation of segments, etc.) may give rise to ambiguity. Most cases of ambiguity, however, can be reduced to (a) ambiguities in the configurations of nodes in a syntactic tree, (b) ambiguities in the labelling of nodes, and (c) ambiguities combining (a) and (b). In a phrase structure model of annotation, these can be simply listed as (a´) bracketing, (b´) labelling and (c´) bracketing + labelling.
As an indication of how ambiguities may in practice be included
in an annotation scheme, we illustrate one method of specifying
alternative analysis in a linear (horizontal) format for phrase
structure annotation. (The actual choice of an
ambiguity-encoding device is a matter for the Text
Representation guidelines.) The device we illustrate can be
// x1 / x2 / ... / xn //
where // delimits the whole ambiguity set, / separates one alternative analysis from another, and x1, x2, ... xn represents the alternative analyses. The same device can be used for alternative labellings, alternative bracketings, and alternative labellings + bracketings. The method of representation least open to misunderstanding is the one where x1, x2, ... xn are entire sentences. But this method also suffers from prolixity, and probably a more satisfactory method can be arrived at by `factoring out' from the whole sentence the parts which give rise to ambiguity and the parts which have a constant interpretation. This will clearly be the most practical solution where the ambiguity is purely a matter of labelling. As an example of the proposed notation, in 84 the readings of both 80 and 81 are indicated:
|(84)||Stephen Glanville, who died in 1956, was // [NP an Egyptologist friend [PP of [NP [NP Agatha Christie NP] and [NP her husband NP] NP] PP] NP] / [NP [NP an Egyptologist friend [PP of [NP Agatha Christie NP] PP] NP] and [NP her husband NP] NP] //|