 
 
 
 
 
 
 
  
 Next: GLDB - The Göteborg
Up: Lexical Semantic Resources
 Previous: Introduction
Subsections
  
The Longman Dictionary and Thesaurus
   
Introduction
The Longman Dictionary and the Longman Lexicon of Contemporary English
have extensively been used in the pioneer work to extract NLP-lexicons
from Machine-Readable Dictionaries. Many of the insights for building
large-scale NLP lexicons have been based on studies of these
resources.  Because of their age, their organization and structuring
is still based on the traditional practice of making dictionaries, but
certain features have made them particularly suitable for deriving
NLP-lexicons.
   
The Longman Dictionary of Contemporary English
The Longman Dictionary of Contemporary English [Pro78] is a
middle-size learner's dictionary: 45,000 entries and 65,000 word
senses. Entries are distinguished as homographs on the basis of the
historic origin of words and their part-of-speech, where each entry
may have one or more meanings. The entry-sense distributions for the
major parts of speech are as shown in Table 3.1.
 
Table 3.1:
Number of Entries and Senses in LDOCE
| 
|  | Entries | Senses | Polysemy |  | Nouns | 23800 | 37500 | 1.6 |  | Verbs | 7921 | 15831 | 1.9 |  | Adjectives | 6922 | 11371 | 1.6 |  | Total | 38643 | 64702 | 1.7 |  | 
 
The information provided in entries comprises:
- Definitions using a limited set of  2000 Controlled Vocabulary
Words and 3000 derived  words.
- Examples.
- Grammatical information on the constituent structure of
complementation of the words.  LDOCE is mostly known for its
high-quality grammatical coding, however, since the focus is here on
semantics these are not further specified here.
- Usage labels in the form of codes and comments, covering
register , style (11 codes), dialect (20 codes) and region constraints
(9 codes).
- Subject Field codes and comments indicating the domain of
interest to which a meaning  is related.
- Semantic codes either classifying nominal meanings or expressing selectional 
restrictions for the complementation of verbal and adjectival meanings.
Most of the information is stored in textual form. However, the usage
codes, the subject-field code and the semantic codes are stored in the
form of a unique code system. 
There are 100 main Subject Field codes which can be subdivided as
follows:
- MD
- medical
- MDZA
- medical anatomy
- ON
- occupation
- VH
- vehicles
The Subject Field Codes have been stored for 30% of the verb senses
and 59% of the noun senses.  There are 100 main fields and 246
subdivisions. Two main fields can also be combined, MDON represents
both medical and occupation.
In total, there are 32 different semantic codes in LDOCE. A distinction
can be made between basic codes (19 codes) and codes that represent a
combination of a basic code (13 combinations):
- A
- Animal
- B
- Female Animal
- C
- Concrete
- D
- Male Animal
- E
- Solid or Liquid (not gas): S + L
- F
- Female Human
- G
- Gas
- H
- Human
- I
- Inanimate Concrete
- J
- Movable Solid
- K
- Male Animal or Human = D + M
- L
- Liquid
- M
- Male Human
- N
- Not Movable Solid
- O
- Animal or Human = 	A + H
- P
- Plant
- Q
- Animate
- R
- Female = B + F
- S
- Solid
- T
- Abstract
- U
- Collective Animal or Human = (Collective + O)
- V
- Plant or Animal = (P + A)
- W
- Inanimate Concrete or Abstract = (T + I)
- X
- Abstract or Human = 	(T + H)
- Y
- Abstract or Animate = (T + H)
- Z
- Unmarked
- 1
- Human or Solid = (H + S)
- 2
- Abstract or Solid = (T + S)
- 4
- Abstract Physical
- 5
- Organic Material
- 6
- Liquid or Abstract = (L + T)
- 7
- Gas or Liquid = (G + L)
The basic codes are organized into the hierarchy shown in Figure 3.1
 
 
Figure 3.1:
Hierachy of semantic codes in LDOCE
|  | 
Most noun senses have a semantic code. In the case of nouns these
codes can be seen as a basic classification of the meaning. In the
case of verbs and adjectives however the codes indicate selection
restrictions of their arguments. These selection restrictions can also
be inferred from their definitions in which constituents corresponding
with the complements of the defined verbs or adjectives have been put
between brackets.
   
The Longman Lexicon of Contemporary English
LLOCE, the Longman Lexicon of Contemporary English, is a small
size learner style dictionary largely derived from LDOCE
and organized along semantic principles. A quantitative profile of the
information provided is given in the table below.
| Number of entries | 16,000 | 
| Numer of senses | 25,000 | 
| Semantic fields | 
| Major codes | 14 |  | Group codes | 127 |  | Set codes | 2441 |  | 
| Grammar codes | same as LDOCE | 
| Selectional restrictions | same as LDOCE | 
| Domain & register Labels | same as LDOCE | 
 
Semantic classification in LLOCE is articulated in 3 tiers of
increasingly specific concepts represented as major, group and set
codes, e.g.
        <MAJOR: A> Life and living things        
                          |
        <GROUP: A50-61> Animals/Mammals                
                          |
        <SET: A53> The cat and similar animals:
                      cat, leopard, lion, tiger,...
Each entry is associated with a set code, e.g.
        <SET: A53> nouns The cat and similar animals
        --------------------------------------------
        cat 1 a small domestic [=> A36] animal ... 
            2 any animal of a group ...
        ...
        panther [Wn1] 1 a leopard ... 
                      2 AmE cougar.
        ...
        <SET: A53> nouns The dog and similar animals
        --------------------------------------------
        dog a domestic animal with a coat of hair ...
Relations of semantic similarity between codes not expressed
hierarchically are crossreferenced, e.g.
        <SET: A53> nouns The cat and similar animals
        --------------------------------------------
        cat 1 a small domestic [=> A36] animal ...  
                               ^^^^^^^^
        <SET: A36> Man breeding living things
        -------------------------------------
        ....
There are 14 major codes, 127 group codes and 2441 set codes. The
list of major codes below provides a general idea of the semantic
areas covered:
        
        <A> Life and living things
        <B> The body, its functions and welfare
        <C> People and the family
        <D> Buildings, houses, the home, clothes, belongings, and personal care 
        <E> Food, drink, and farming
        <F> Feelings, emotions, attitudes, and sensations
        <G> Thought and communication, language and grammar
        <H> H Substances, materials, objects, and equipment
        <I> Arts and crafts, sciences and technology, industry and education
        <J> Numbers, measurement, money, and commerce
        <K> Entertainment, sports, and games
        <L> Space and time
        <M> Movement, location, travel, and transport
        <N> General and abstract terms
The list of group and set codes for the M domain (Movement,
location, travel, and transport) given in Table 3.2 provides an
example of the degree of details used in semantic classification.
 
Table 3.2:
Set codes for the domain of  Movement, location, travel
and trasport in LLOCE.
| 
| 
| Moving, coming, and going |  | M 1	moving, coming, and going |  | M 2	(of a person or object) not moving |  | M 3	stopping (a person or object) from moving |  | M 4	leaving and setting out |  | M 5	arriving, reaching, and entering |  | M 6	letting in and out |  | M 7	welcoming and meeting |  | M 8	getting off, down, and out |  | M 9	climbing and getting on |  | M 10	movement and motion |  | M 11	staying and stopping |  | M 12	passages, arrivals, and departures |  | M 13	climbing, ascending, and descending |  | M 14	moving |  | M 15	not moving |  | M 16	moving quickly |  | M 17	not moving quickly |  | M 18	speed |  | M 19	particular ways of moving |  | M 20	walking unevenly, unsteadily, etc |  | M 21	walking gently, etc |  | M 22	walking strongly, etc |  | M 23	walking long and far, etc |  | M 24	running and moving quickly, etc |  | M 25	running and moving lightly and quickly, etc |  | M 26	crawling and creeping, etc |  | M 27	loitering and lingering, etc |  | M 28	flying in various ways |  | M 29	driving and steering, etc |  | M 30	going on a bicycle, etc |  | M 31	moving faster and slower |  | M 32	coming to a stop, moving away, etc |  | M 33	hurrying and rushing |  | M 34	following, chasing, and hunting |  | M 35	escaping, etc |  | M 36	things and persons chased, etc |  | M 37	avoiding and dodging |  | M 38	leaving and deserting |  | M 39	moving forward, etc |  | M 40	turning, twisting, and bending |  | M 41	flowing |  | M 42	coasting and drifting |  | M 43	bouncing and bobbing |  | Putting and taking, pulling and pushing |  | M 50	putting and placing |  | M 51	carrying, taking, and bringing |  | M 52	sending and transporting |  | M 53	taking, leading, and escorting |  | M 54	sending and taking away |  | M 55	showing and directing |  | M 56	pulling |  | M 57	pulling out |  | M 58	pushing |  | M 59	throwing |  | M 60	throwing things and sending things out |  | M 61	extracting and withdrawing |  | M 62	sticking and wedging |  | M 63	closing, shutting, and sealing |  | M 64	fastening and locking |  | M 65	opening and unlocking |  | M 66	open and not open |  | M 67	openings |  | Travel and visiting |  | M 70	visiting |  | M 71	inviting and summoning people |  | M 72	Meeting people and things |  | M 73	visiting and inviting |  | M 74	travelling |  | M 75	travelling |  | M 76	people visiting and travelling |  | M 77	people guiding and taking |  | M 78	travel businesses |  | M 79	hotels, etc |  | M 80	in hotels, etc |  | M 81	people in hotels, etc |  | M 82	in hotels, travelling, etc |  | M 83	in hotels, travelling, etc |  | Vehicles and transport on land |  | M 90	transport |  | M 91	vehicles generally |  | M 92	special, usu older, kinds of vehicles |  | M 93	lighter motor vehicles, etc |  | M 94	heavier motor vehicles |  | M 95	buses, etc |  | M 96	bicycles and motorcycles, etc |  | M 97	persons driving vehicles, etc |  | M 98	smaller special vehicles, etc |  | M 99	vehicles for living in |  | M 100	parts of vehicles outside |  | 
| M 101	parts of vehicles inside |  | M 102	the chassis and the engine |  | M 103	parts of a bicycle |  | M 104	related to motocycles |  | M 105	garages and servicing |  | M 106	trams |  | M 107	railways |  | M 108	trains |  | M 109	places relating to railways, travel, etc |  | M 110	persons working on railways, etc |  | M 111	driving and travelling by car, etc |  | M 112	crashes and accidents |  | Places |  | M 120	places and positions |  | M 121	space |  | M 122	edges, boundaries, and borders |  | M 123	neighbourhoods and environments |  | M 124	at home and abroad |  | M 125	roads and routes |  | M 126	special roads and streets in towns |  | M 127	special roads and streets in the country |  | M 128	special streets in towns |  | M 129	very large modern roads |  | M 130	no-entries and cul-de-sacs |  | M 131	paths and tracks |  | M 132	parts of roads, etc |  | M 133	lights on roads, etc |  | M 134	bends and bumps, etc |  | M 135	intersections and bypasses |  | M 136	bridges and tunnels |  | Shipping |  | M 150	boats |  | M 151	boats in general |  | M 152	smaller kinds of boats |  | M 153	larger kinds of sailing boats |  | M 154	powered ships |  | M 155	ships with special uses |  | M 156	merchant ships, etc |  | M 157	parts of ships |  | M 158	positions on ships, etc |  | M 159	harbours and yards |  | M 160	quays and docks |  | M 161	lighthouses, buoys, etc |  | M 162	crews |  | M 163	sailors, etc |  | M 164	ship's officers, etc |  | M 165	mooring and docking |  | M 166	setting sail |  | M 167	oars and paddles |  | M 168	floating and sinking, etc |  | M 169	wrecking and marooning, etc |  | Aircraft |  | M 180	aircraft and aviation |  | M 181	jet aeroplanes |  | M 182	balloons, etc |  | M 183	helicopters |  | M 184	spaceships |  | M 185	airports |  | M 186	parts of aircraft |  | M 187	landing and taking off |  | M 188	landing and taking off |  | M 189	people working on and with aeroplanes |  | Location and direction |  | M 200	surfaces and edges |  | M 201	higher and lower positions in objects, space, etc |  | M 202	front, back, and sides |  | M 203	about and around, etc |  | M 204	in, into, at, etc |  | M 205	out, from, etc |  | M 206	here and not here |  | M 207	across, through, etc |  | M 208	against |  | M 209	near |  | M 210	far |  | M 211	between and among |  | M 212	away and apart |  | M 213	back and aside |  | M 214	to and towards |  | M 215	from place to place |  | M 216	on and upon |  | M 217	off |  | M 218	below, beneath, and under |  | M 219	above and over |  | M 220	after and behind |  | M 221	in front, before, and ahead |  | M 222	through and via |  | M 223	past and beyond |  | M 224	up |  | M 225	down |  |  | 
 
  
Comparison with Other Lexical Databases
 
LDOCE is a traditional Machine-Readable Dictionary. However, because
of its controlled- vocabulary, the systematic coding of the
information and the elaborate use of codes it has been a very useful
starting point for deriving basic NLP lexicons. [Bri89] give an
extensive description of the possibilities for elaboration.  Except
for the semantic features, LDOCE does not contain complete semantic
hierarchies as in WordNet, EDR or other ontologies.
The bottom level of word sense clustering in LLOCE consists of sets of
semantically related words which need not be synonyms. For example,
the set D172 (baths and showers) contain nouns such as bath, shave, shower. This contrasts with lexical databases such as
WordNet where synsets are meant to contain synonymous word senses.
A further difference with WordNet regards taxonomic organization. In
Wordnet, hierarchical relations are mainly encoded as hyp(er)onymic
links forming chains of synsets whose length can vary considerably. In
LLOCE there are only three tiers and considerable
crossreferencing. Moreover, only the terminal leaves of the LLOCE
taxonomy correspond to actual word senses; the labels associated with
intermediate levels (major, group and set codes) are abstractions over
sets of semantically related word senses, just like the intermediate concepts used in the EDR (see §3.6).
   
Relations to Notions of Lexical Semantics
The semantic codes for nouns in LDOCE represents a very minimal and
shallow classification.  The LLOCE classification is more elaborated but is still not
very deep. This classification information is similar to the taxonomic
models described in §2.7.
LLOCE in addition combines the entry format of LDOCE, which provides
detailed syntactic information (in the form of grammar codes) with the
semantic structure of a thesaurus. This combination is particularly
well suited for relating syntactic and semantic properties of words,
and in particular to individuate dependencies between semantic
predicates classes and subcategorization frames as described in
§2.4.
   
LE Uses
LDOCE has been most useful as a syntactic lexicon for parsing.  The
usage of LDOCE as a semantic resource is not as wide-spread as one
would expect. This is mainly due to its restricted availability and
the fact that it still requires considerable processing to derive a
full-coverage NLP lexicon from it. [Bri89] give an overview of the
different kind of NLP lexicons that can be derived from
it. [Vos95b] give a description how a richly encoded semantic
lexicon with weighted features can be derived which is used in an
information retrieval task.
[San92a] and [San93b] use LLOCE to
derive verb entries with detailed semantic frame
information. [Poz96] describe a system which
uses LLOCE to assign semantic tags to verbs in bracketed corpora to
elicit dependencies between semantic verb classes and their admissible
subcategorization frames.
 
 
 
 
 
 
 
  
 Next: GLDB - The Göteborg
Up: Lexical Semantic Resources
 Previous: Introduction
EAGLES Central Secretariat eagles@ilc.cnr.it