next up previous contents
Next: Ambivalence Up: Underspecificationambiguity and ambivalence Previous: Underspecification

Recommendations

Ambiguity

This phenomenon we define as an unresolved choice between alternative annotations. Again, there may be a number of reasons for ambiguous syntactic annotationgif, some motivated by practical considerations, and others by more theoretical linguistic considerations. Here we distinguish two major kinds of ambiguity.

System ambiguity

A system ambiguity is an ambiguity which results from parser limitations. Automatic corpus parsers, which are likely to be increasingly employed for syntactic annotation on a large scale in the future, are at present far from achieving the `ideal' outcome of a single correct parse for each sentence. Where there is likelihood of parsing error, a preferable solution, for most purposes, will be for the parser to output more than one parse, leaving it to the human interpreter to decide which is a correct analysis. Thus, the Helsinki ENGCG parser leaves alternative analyses in the output, where disambiguation cannot be achieved. This is illustrated in table 10, where, due to the categorical ambiguity of addresses, two syntactic structures are possible: [NP this paper NP] [VP addresses ...] and [NP this [paper addresses] NP].

 

"<*this>"
          "this" <*> DET CENTRAL DEM SG @DN>
"<paper>"
          "paper" N NOM SG @SUBJ @NN>
"<addresses>"
          "address" <SVO> V PRES SG3 VFIN @+FMAINV
          "address" N NOM PL @NPHR
"<three>"
          "three" NUM CARD @QN>
"<main>"
          "main" A ABS @AN>
"<issues>"
          "issue" N NOM PL @NPHR @OBJ
Table 10: Indicating ambiguity in ENGCG 

Although in principle a manual posteditor could eliminate most system ambiguities, in practice such a postediting phase may be too expensive or too time-consuming, so that ambiguities will have to remain in the annotated corpus.

Use ambiguity

This second kind of ambiguity is the kind that cannot be resolved by a human posteditor -- because even when context is taken into account, the sentence cannot be disambiguated by a human interpreter. Such `genuine ambiguities' may result in a glaring difference of meaning, as in 80 and 81, where it is not clear whether Stephen Glanville is Agatha Christie's husband or not:

(80)  Stephen Glanville, who died in 1956, was [NP an Egyptologist friend [PP of [NP [NP Agatha Christie NP] and [NP her husband NP] NP] PP] NP]
(81)  Stephen Glanville, who died in 1956, was [NP [NP an Egyptologist friend [PP of [NP Agatha Christie NP] PP] NP] and [NP her husband NP] NP]
On the other hand, a surprising number of use ambiguities make relatively little difference to meaning, as in 82 and 83:

(82)  They [VP let me [VP speak [ADVP now and then ADVP] VP] VP]
(83)  They [VP let me [VP speak VP] [ADVP now and then ADVP] VP]
In 82 and 83, the adverbial now and then can equally well modify let me [speak] or just speakgif.

Our recommendation is, where feasible, to indicate ambiguities in the annotated corpus, and to specify by an annotation device whether the ambiguity is of the system type or the use type. The nature and function of the annotation device should be specified in the documentation to the annotation scheme.

Methods of indicating ambiguities

Syntactic ambiguities may be of various types: in fact, all the layers of syntactic annotation listed in section 3 (bracketing of segments, labelling of segments, subcategorisation of segments, etc.) may give rise to ambiguity. Most cases of ambiguity, however, can be reduced to (a) ambiguities in the configurations of nodes in a syntactic tree, (b) ambiguities in the labelling of nodes, and (c) ambiguities combining (a) and (b). In a phrase structure model of annotation, these can be simply listed as (a´) bracketing, (b´) labelling and (c´) bracketing + labellinggif.

As an indication of how ambiguities may in practice be included in an annotation scheme, we illustrate one method of specifying alternative analysis in a linear (horizontal) format for phrase structure annotation. (The actual choice of an ambiguity-encoding device is a matter for the Text Representation guidelines.) The device we illustrate can be symbolised // x1 / x2 / ... / xn //
where // delimits the whole ambiguity set, / separates one alternative analysis from another, and x1, x2, ... xn represents the alternative analyses. The same device can be used for alternative labellings, alternative bracketings, and alternative labellings + bracketings. The method of representation least open to misunderstanding is the one where x1, x2, ... xn are entire sentences. But this method also suffers from prolixity, and probably a more satisfactory method can be arrived at by `factoring out' from the whole sentence the parts which give rise to ambiguity and the parts which have a constant interpretation. This will clearly be the most practical solution where the ambiguity is purely a matter of labelling. As an example of the proposed notation, in 84 the readings of both 80 and 81 are indicated:

(84)  Stephen Glanville, who died in 1956, was // [NP an Egyptologist friend [PP of [NP [NP Agatha Christie NP] and [NP her husband NP] NP] PP] NP] / [NP [NP an Egyptologist friend [PP of [NP Agatha Christie NP] PP] NP] and [NP her husband NP] NP] //
It should be noted, finally, that ambiguity and underspecification are technically different (the one spelling out alternatives, and the other leaving alternatives implicit), but that, in practice, they may amount to the same thing. For example, in dependency tree annotation it may be found convenient (as in current versions of ENGCG) to omit identification of the head of a dependency link, while specifying its word class. This omission fits the definition of underspecification, but it would be possible to identify all possible heads in the sentence for a given dependent, and hence to state explicitly the alternative analyses, thus reformulating underspecification as ambiguity.



next up previous contents
Next: Ambivalence Up: Underspecificationambiguity and ambivalence Previous: Underspecification