next up previous contents
Next: GLDB - The Göteborg Up: Lexical Semantic Resources Previous: Introduction

Subsections


  
The Longman Dictionary and Thesaurus

   
Introduction

The Longman Dictionary and the Longman Lexicon of Contemporary English have extensively been used in the pioneer work to extract NLP-lexicons from Machine-Readable Dictionaries. Many of the insights for building large-scale NLP lexicons have been based on studies of these resources. Because of their age, their organization and structuring is still based on the traditional practice of making dictionaries, but certain features have made them particularly suitable for deriving NLP-lexicons.

   
The Longman Dictionary of Contemporary English

The Longman Dictionary of Contemporary English [Pro78] is a middle-size learner's dictionary: 45,000 entries and 65,000 word senses. Entries are distinguished as homographs on the basis of the historic origin of words and their part-of-speech, where each entry may have one or more meanings. The entry-sense distributions for the major parts of speech are as shown in Table 3.1.

 
Table 3.1: Number of Entries and Senses in LDOCE
  Entries Senses Polysemy
Nouns 23800 37500 1.6
Verbs 7921 15831 1.9
Adjectives 6922 11371 1.6
Total 38643 64702 1.7
 

The information provided in entries comprises:

Most of the information is stored in textual form. However, the usage codes, the subject-field code and the semantic codes are stored in the form of a unique code system.

There are 100 main Subject Field codes which can be subdivided as follows:

MD
medical
MDZA
medical anatomy
ON
occupation
VH
vehicles
The Subject Field Codes have been stored for 30% of the verb senses and 59% of the noun senses. There are 100 main fields and 246 subdivisions. Two main fields can also be combined, MDON represents both medical and occupation.

In total, there are 32 different semantic codes in LDOCE. A distinction can be made between basic codes (19 codes) and codes that represent a combination of a basic code (13 combinations):

A
Animal
B
Female Animal
C
Concrete
D
Male Animal
E
Solid or Liquid (not gas): S + L
F
Female Human
G
Gas
H
Human
I
Inanimate Concrete
J
Movable Solid
K
Male Animal or Human = D + M
L
Liquid
M
Male Human
N
Not Movable Solid
O
Animal or Human = A + H
P
Plant
Q
Animate
R
Female = B + F
S
Solid
T
Abstract
U
Collective Animal or Human = (Collective + O)
V
Plant or Animal = (P + A)
W
Inanimate Concrete or Abstract = (T + I)
X
Abstract or Human = (T + H)
Y
Abstract or Animate = (T + H)
Z
Unmarked
1
Human or Solid = (H + S)
2
Abstract or Solid = (T + S)
4
Abstract Physical
5
Organic Material
6
Liquid or Abstract = (L + T)
7
Gas or Liquid = (G + L)
The basic codes are organized into the hierarchy shown in Figure 3.1
 

Figure 3.1: Hierachy of semantic codes in LDOCE


Most noun senses have a semantic code. In the case of nouns these codes can be seen as a basic classification of the meaning. In the case of verbs and adjectives however the codes indicate selection restrictions of their arguments. These selection restrictions can also be inferred from their definitions in which constituents corresponding with the complements of the defined verbs or adjectives have been put between brackets.

   
The Longman Lexicon of Contemporary English

LLOCE, the Longman Lexicon of Contemporary English, is a small size learner style dictionary largely derived from LDOCE and organized along semantic principles. A quantitative profile of the information provided is given in the table below.

Number of entries  16,000
Numer of senses  25,000
Semantic fields
Major codes 14
Group codes 127
Set codes 2441
Grammar codes same as LDOCE
Selectional restrictions same as LDOCE
Domain & register Labels same as LDOCE

Semantic classification in LLOCE is articulated in 3 tiers of increasingly specific concepts represented as major, group and set codes, e.g.


        <MAJOR: A> Life and living things        

                          |

        <GROUP: A50-61> Animals/Mammals                

                          |

        <SET: A53> The cat and similar animals:

                      cat, leopard, lion, tiger,...

Each entry is associated with a set code, e.g.

        <SET: A53> nouns The cat and similar animals

        --------------------------------------------

        cat 1 a small domestic [=> A36] animal ... 

            2 any animal of a group ...

        ...

        panther [Wn1] 1 a leopard ... 

                      2 AmE cougar.

        ...



        <SET: A53> nouns The dog and similar animals

        --------------------------------------------

        dog a domestic animal with a coat of hair ...

Relations of semantic similarity between codes not expressed hierarchically are crossreferenced, e.g.

        <SET: A53> nouns The cat and similar animals

        --------------------------------------------

        cat 1 a small domestic [=> A36] animal ...  

                               ^^^^^^^^



        <SET: A36> Man breeding living things

        -------------------------------------

        ....

There are 14 major codes, 127 group codes and 2441 set codes. The list of major codes below provides a general idea of the semantic areas covered:


        

        <A> Life and living things

        <B> The body, its functions and welfare

        <C> People and the family

        <D> Buildings, houses, the home, clothes, belongings, and personal care 

        <E> Food, drink, and farming

        <F> Feelings, emotions, attitudes, and sensations

        <G> Thought and communication, language and grammar

        <H> H Substances, materials, objects, and equipment

        <I> Arts and crafts, sciences and technology, industry and education

        <J> Numbers, measurement, money, and commerce

        <K> Entertainment, sports, and games

        <L> Space and time

        <M> Movement, location, travel, and transport

        <N> General and abstract terms

The list of group and set codes for the M domain (Movement, location, travel, and transport) given in Table 3.2 provides an example of the degree of details used in semantic classification.
 
Table 3.2: Set codes for the domain of Movement, location, travel and trasport in LLOCE.
Moving, coming, and going
M 1 moving, coming, and going
M 2 (of a person or object) not moving
M 3 stopping (a person or object) from moving
M 4 leaving and setting out
M 5 arriving, reaching, and entering
M 6 letting in and out
M 7 welcoming and meeting
M 8 getting off, down, and out
M 9 climbing and getting on
M 10 movement and motion
M 11 staying and stopping
M 12 passages, arrivals, and departures
M 13 climbing, ascending, and descending
M 14 moving
M 15 not moving
M 16 moving quickly
M 17 not moving quickly
M 18 speed
M 19 particular ways of moving
M 20 walking unevenly, unsteadily, etc
M 21 walking gently, etc
M 22 walking strongly, etc
M 23 walking long and far, etc
M 24 running and moving quickly, etc
M 25 running and moving lightly and quickly, etc
M 26 crawling and creeping, etc
M 27 loitering and lingering, etc
M 28 flying in various ways
M 29 driving and steering, etc
M 30 going on a bicycle, etc
M 31 moving faster and slower
M 32 coming to a stop, moving away, etc
M 33 hurrying and rushing
M 34 following, chasing, and hunting
M 35 escaping, etc
M 36 things and persons chased, etc
M 37 avoiding and dodging
M 38 leaving and deserting
M 39 moving forward, etc
M 40 turning, twisting, and bending
M 41 flowing
M 42 coasting and drifting
M 43 bouncing and bobbing
Putting and taking, pulling and pushing
M 50 putting and placing
M 51 carrying, taking, and bringing
M 52 sending and transporting
M 53 taking, leading, and escorting
M 54 sending and taking away
M 55 showing and directing
M 56 pulling
M 57 pulling out
M 58 pushing
M 59 throwing
M 60 throwing things and sending things out
M 61 extracting and withdrawing
M 62 sticking and wedging
M 63 closing, shutting, and sealing
M 64 fastening and locking
M 65 opening and unlocking
M 66 open and not open
M 67 openings
Travel and visiting
M 70 visiting
M 71 inviting and summoning people
M 72 Meeting people and things
M 73 visiting and inviting
M 74 travelling
M 75 travelling
M 76 people visiting and travelling
M 77 people guiding and taking
M 78 travel businesses
M 79 hotels, etc
M 80 in hotels, etc
M 81 people in hotels, etc
M 82 in hotels, travelling, etc
M 83 in hotels, travelling, etc
Vehicles and transport on land
M 90 transport
M 91 vehicles generally
M 92 special, usu older, kinds of vehicles
M 93 lighter motor vehicles, etc
M 94 heavier motor vehicles
M 95 buses, etc
M 96 bicycles and motorcycles, etc
M 97 persons driving vehicles, etc
M 98 smaller special vehicles, etc
M 99 vehicles for living in
M 100 parts of vehicles outside
M 101 parts of vehicles inside
M 102 the chassis and the engine
M 103 parts of a bicycle
M 104 related to motocycles
M 105 garages and servicing
M 106 trams
M 107 railways
M 108 trains
M 109 places relating to railways, travel, etc
M 110 persons working on railways, etc
M 111 driving and travelling by car, etc
M 112 crashes and accidents
Places
M 120 places and positions
M 121 space
M 122 edges, boundaries, and borders
M 123 neighbourhoods and environments
M 124 at home and abroad
M 125 roads and routes
M 126 special roads and streets in towns
M 127 special roads and streets in the country
M 128 special streets in towns
M 129 very large modern roads
M 130 no-entries and cul-de-sacs
M 131 paths and tracks
M 132 parts of roads, etc
M 133 lights on roads, etc
M 134 bends and bumps, etc
M 135 intersections and bypasses
M 136 bridges and tunnels
Shipping
M 150 boats
M 151 boats in general
M 152 smaller kinds of boats
M 153 larger kinds of sailing boats
M 154 powered ships
M 155 ships with special uses
M 156 merchant ships, etc
M 157 parts of ships
M 158 positions on ships, etc
M 159 harbours and yards
M 160 quays and docks
M 161 lighthouses, buoys, etc
M 162 crews
M 163 sailors, etc
M 164 ship's officers, etc
M 165 mooring and docking
M 166 setting sail
M 167 oars and paddles
M 168 floating and sinking, etc
M 169 wrecking and marooning, etc
Aircraft
M 180 aircraft and aviation
M 181 jet aeroplanes
M 182 balloons, etc
M 183 helicopters
M 184 spaceships
M 185 airports
M 186 parts of aircraft
M 187 landing and taking off
M 188 landing and taking off
M 189 people working on and with aeroplanes
Location and direction
M 200 surfaces and edges
M 201 higher and lower positions in objects, space, etc
M 202 front, back, and sides
M 203 about and around, etc
M 204 in, into, at, etc
M 205 out, from, etc
M 206 here and not here
M 207 across, through, etc
M 208 against
M 209 near
M 210 far
M 211 between and among
M 212 away and apart
M 213 back and aside
M 214 to and towards
M 215 from place to place
M 216 on and upon
M 217 off
M 218 below, beneath, and under
M 219 above and over
M 220 after and behind
M 221 in front, before, and ahead
M 222 through and via
M 223 past and beyond
M 224 up
M 225 down
 

  
Comparison with Other Lexical Databases

LDOCE is a traditional Machine-Readable Dictionary. However, because of its controlled- vocabulary, the systematic coding of the information and the elaborate use of codes it has been a very useful starting point for deriving basic NLP lexicons. [Bri89] give an extensive description of the possibilities for elaboration. Except for the semantic features, LDOCE does not contain complete semantic hierarchies as in WordNet, EDR or other ontologies.

The bottom level of word sense clustering in LLOCE consists of sets of semantically related words which need not be synonyms. For example, the set D172 (baths and showers) contain nouns such as bath, shave, shower. This contrasts with lexical databases such as WordNet where synsets are meant to contain synonymous word senses.

A further difference with WordNet regards taxonomic organization. In Wordnet, hierarchical relations are mainly encoded as hyp(er)onymic links forming chains of synsets whose length can vary considerably. In LLOCE there are only three tiers and considerable crossreferencing. Moreover, only the terminal leaves of the LLOCE taxonomy correspond to actual word senses; the labels associated with intermediate levels (major, group and set codes) are abstractions over sets of semantically related word senses, just like the intermediate concepts used in the EDR (see §3.6).

   
Relations to Notions of Lexical Semantics

The semantic codes for nouns in LDOCE represents a very minimal and shallow classification. The LLOCE classification is more elaborated but is still not very deep. This classification information is similar to the taxonomic models described in §2.7.

LLOCE in addition combines the entry format of LDOCE, which provides detailed syntactic information (in the form of grammar codes) with the semantic structure of a thesaurus. This combination is particularly well suited for relating syntactic and semantic properties of words, and in particular to individuate dependencies between semantic predicates classes and subcategorization frames as described in §2.4.

   
LE Uses

LDOCE has been most useful as a syntactic lexicon for parsing. The usage of LDOCE as a semantic resource is not as wide-spread as one would expect. This is mainly due to its restricted availability and the fact that it still requires considerable processing to derive a full-coverage NLP lexicon from it. [Bri89] give an overview of the different kind of NLP lexicons that can be derived from it. [Vos95b] give a description how a richly encoded semantic lexicon with weighted features can be derived which is used in an information retrieval task.

[San92a] and [San93b] use LLOCE to derive verb entries with detailed semantic frame information. [Poz96] describe a system which uses LLOCE to assign semantic tags to verbs in bracketed corpora to elicit dependencies between semantic verb classes and their admissible subcategorization frames.



next up previous contents
Next: GLDB - The Göteborg Up: Lexical Semantic Resources Previous: Introduction
EAGLES Central Secretariat eagles@ilc.cnr.it