The Categorial Grammar approach to subcategorisation will be exemplified with a description of the UCG framework used in the ACQUILEX project (Sanfilippo, 1993b). For ease of presentation, only a brief summary is given -- see Sanfilippo (1993b) for a full account.
Words and phrases are represented as (typed) feature structures where orthographic, syntactic and semantic information is simultaneously represented as a conjunction of attribute-value pairs forming a sign:
[ORTH: orth
CAT: cat
SEM: sem]
The category attribute of a sign is either basic or complex. Basic
categories are binary feature structures consisting of a category
type, and a series of attribute value pairs encoding morphosyntactic
information:
[CAT-TYPE: cat-type
M-FEATS: m-feats]
Three basic cat-types are used
cat-type[m-feats]Morphosyntactic features are included only where needed.
Complex categories are recursively defined by letting the type `cat' instantiate a feature structure with attributes RESult, DIRection and ACTive. RESult can take as value either a basic or complex category, ACTive is of type `sign', and the direction attribute encodes order of combination relative to the active part of the sign (e.g. forward or backward):
[RES: cat
DIR: dir
ACT: sign]
In verbs, the active part of the category structure encodes the
subcategorisation properties, e.g. subject and object for transitives:
[ORTH: < love >
CAT:[RES:[RES:sent
ACT:[np-sign
CAT:nom]]
ACT:[np-sign
CAT:np[acc]]]]
The semantics of a sign is a formula. A formula consists of an index, a
predicate and at least one argument which can be either an entity or a
formula (which are both subsumed by sem)
[IND: entity
PRED: pred
ARG1: sem]
The index of a formula is an entity which provides partial
information about the ontological type denoted by the formula, e.g. `e'
for eventualities and `o,x,y,z' for individual objects. In addition a
contentless entity, `dummy', is employed in the semantic
characterisation of pleonastic noun phrases, e.g. subject of
extraposition verbs. The argument of a predicate can be either an
entity or a formula. For ease of exposition formulas are linearised,
e.g. the feature structure
[IND: [1] x
PRED: book
ARG1: [1]]
where [1] flags reentrant (e.g. identical) values is
abbreviated as <x1>book(x1) where x1 is a named variable.
The classification of subcategorisation types involves defining
Verbs are characterised as properties of eventualities, and thematic roles are relations between eventualities and individuals, e.g.
<e1>and(<e1>sleep(e1), <e1>agent(e1,john))Following Dowty (1989), the semantic content of thematic relations is expressed in terms of prototypical cluster-concepts -- the proto-agent and proto-patient roles (`p-agt', `p-pat') -- determined for each choice of predicate through attribution of selected entailments which qualify the relative agentive strength and affectedness of event participants. Dowty's insights are augmented by introducing a third proto-role, `prep' for prepositional arguments (`semantically restricted' in LFG terms) and the contentless predicate `no-
' to characterise the relation between a pleonastic NP to its
governing verb. In addition, proto-roles are formalised as supersets
of specific clusters of meaning components which are instrumental in
the identification of semantic verb classes (Sanfilippo & Poznański, 1992; Sanfilippo, 1993b; Sanfilippo, 1993a) -- see examples.
A primary semantic classification of verb types is obtained in terms of argument arity. Further distinctions are made according to what kind of verbal arguments are encoded:
Here are some of the semantic structures distinguished:
STRICT-INTRANS-SEM
<e1>and(<e1>pred(e1), <e1>p-agt(e1,x)
STRICT-TRANS-SEM
<e1>and(<e1>pred(e1), <e1>and(<e1>p-agt(e1,x),
<e1>p-pat(e1,y)))
OBL-TRANS/DITRANS-SEM
<e1>and(<e1>pred(e1), <e1>and(<e1>p-agt(e1,x),
<e1>and(<e1>p-pat(e1,y),
<e1>prep(e1,y))))
P-AGT-SUBJ-INTRANS-XCOMP/COMP-SEM
<e1>and(<e1>pred(e1), <e1>and(<e1>p-agt(e1,x), verb-sem))
Category structures are distinguished according to the values for the features RES and ACT. For example, the CAT of strict intransitives states that the result is a basic category of type `sent' and the active part is a noun phrase (i.e. there is only subject selection):
STRICT-INTRANS-CAT
[RES: sent
ACT: np-sign]
More complex category types can be built using more basic category
types, e.g.
STRICT-TRANS-CAT
[RES: strict-intrans-cat
ACT: [np-sign
CAT: np[acc]]]
DITRANS-CAT
[RES: strict-trans-cat
ACT: [np-sign
CAT: np[acc]]]
OBL-TRANS-CAT
[RES: strict-trans-cat
ACT: [np-sign
CAT: np[p-case]]]
Control categories are used to describe the syntactic structure of
both equi and raising verbs. All control categories follow (inherit
from) the following pattern where the reentrancy tag [1] says that the
complement active sign (e.g. the complement subject) is controlled by
the immediately preceding active sign (control is expressed by
equating entities which partially describe the semantics of active
signs):
CONTROL-CAT
[RES: [RES: cat
ACT: [sign
SEM:ARG2: [1] entity]]
ACT: [sign
CAT:ACT: [sign
SEM:ARG2: [1]]]]
The controlling argument can be the subject or the object according to
whether the verb is transitive or intransitive (transitivity is
determined by the presence of an accusative active np-sign).
INTRANS-CONTROL-CAT
[RES: [RES: sent
ACT: [sign
SEM:ARG2: [1] entity]]
ACT: [sign
CAT:ACT: [sign
SEM:ARG2: [1]]]]
TRANS-CONTROL-CAT
[RES: [RES: strict-intrans-cat
ACT: [np-sign
CAT: np[acc]
SEM:ARG2: [1] entity]]
ACT: [sign
CAT:ACT: [sign
SEM:ARG2: [1]]]]
Actual control categories are built adding further specialisations to
the control descriptions above. For example, the category structure
for intransitive equi verbs is defined as follows:
For intransitive control verbs which take an infinitive VP complement e.g. ``Jon wants/seems to leave''
INTRANS-VPINF-CONTROL-CAT
Inherits from INTRANS-CONTROL-CAT
[RES: strict-intrans-cat
ACT: [vp-sign
CAT:RES: sent[fin]]]
Verbs signs are defined by linking active signs in the category
structure to argument slots in predicate argument structures. This is
done by means of reentrancy links, as indicated by the tag [1]
in the structure for strict intransitive verbs below.
[strict-intrans-sign
CAT:ACT: [np-sign
SEM: [1] <e1>p-agt(e1,x)]
SEM: [strict-intrans-sem
<e1>and(<e1>pred(e1), [1])]]
Since only templates for verbs which have a maximum of 3 arguments are
given, only two additional general linking patterns are needed:
[two-arguments-verb-sign
CAT: [RES: [RES: sent
ACT: [sign
SEM: [1]]]
ACT: [sign
SEM: [2]]]
SEM: <e1> and(and(pred(e1),[1]),[2])]
[three-arguments-verb-sign
CAT: [RES: [RES: [RES: sent
ACT: [sign
SEM: [0]]]
ACT: [sign
SEM: [1]]]
ACT: [sign
SEM: [2]]]
SEM: <e1> and(and(and(pred(e1),[0]),[1]),[2])]
To conclude, here are some sample two-arguments-verb-sign and three-arguments-verb-sign structures
STRICT-TRANS-SIGN
[CAT: strict-trans-cat
SEM: strict-trans-sem]
SUBJ-EQUI-INTRANS-VPINF-SIGN
[CAT: intrans-vpinf-control-cat
SEM: p-agt-subj-intrans-xcomp/comp-sem]
DITRANS-SIGN
[CAT: ditrans-cat
SEM: obl-trans/ditrans-sem ]
OBL-TRANS-SIGN
[CAT: [RES: strict-intrans-cat
ACT: [np-sign
CAT: np[p-case]]]
SEM: intrans-obl-sem]
DITRANS-SIGN
and OBL-TRANS-SIGN above) is the outermost sign in the category
structure, even though only in ditransitives does it precede the `theme'
object (the difference in word order is handled syntactically, see
Sanfilippo (1993b) and references therein).