next up previous contents
Next: Lexicons for Machine-Translation Up: Lexical Semantic Resources Previous: Higher Level Ontologies


Unified Medical Language System


Unified Medical Language System (UMLS) is a set of knowledge sources developed by the US National Library of Medicine as experimental products. It consists of four sections: a metathesaurus, a semantic network, a specialist lexicon and an information sources map, and contains information about medical terms and their interrelationships.


The Metathesaurus

The Metathesaurus contains syntactic and semantic information about medical terms that appear in 38 controlled vocabularies and classifications, such as SNOMED, MeSh, and ICD. It is organised by concept, and contains over 330,000 concepts and 739,439 terms. It also contains syntactic variations of terms, represented as strings. The representation takes the form of three levels: a set of general concepts (represented by a code), a set of concept names (represented by another, related, code) and a set of strings (represented by another code and the lexical string itself). An illustrative example is given in Figure 3.3. Meanings and relationships are preserved from the source vocabularies, but some additional information is provided and new relationships between concepts and terms from different sources are established.
Figure 3.3: Fragment of Concept Hierarchy.

The relationships described between concepts in the metathesaurus are the following: An example record from the relational file is given below.
C0001430 | CHD | C0022134 | isa | MSH97 | MTH
This indicates that there is an is-a relationship between the concept nesidioblastoma (C0001430) and the term adenoma (C0022134), that the former is a child of the latter, the source of the relationship comes from the MeSh subject headings (MSH97), and that this relationship was created specifically for the Metathesaurus (MTH).

The Semantic Network

The semantic network contains information about the semantic types that are assigned to the concepts in the Metathesaurus. The types are defined explicitly by textual information and implicitly by means of the hierarchies represented. The semantic types are represented as nodes and the relationships between them as links. Relationships are established with the highest level possible. As a result, the classifications are very general rather than explicit ones between individual concepts.

In the semantic network, relations are stated between semantic types. The primary relation is that of hyponymy, but there are also five major categories of non-hierarchical relations:

The following example shows how terms can be decomposed into their various concepts and positioned in the hierarchy.

D-33-- Open Wounds of the Limbs
DD-33620 Open wound of knee without complication 891.0
DD-33621 Open wound of knee with complication 891.1

The Specialist Lexicon

The specialist lexicon provides detailed syntactic information about biomedical terms and common English words. An individual lexical entry is created for each spelling variant and syntactic category for a word. These are then grouped together to form a unit record for each word, defined in a frame structure consisting of slots and fillers. Full morphological and syntactic information is provided, e.g. syntactic category, number, gender, tense, adjectival type, noun type, etc.

The Information Sources Map

The information sources map contains details of the original sources of the terms. It consists of a database of records describing the information resources, with details such as scope, probability utility and access conditions. The sources themselves are varied and include bibliographic databases, factual databases and expert systems.

Comparison with Other Lexical Databases

SNOMED thus represents an implicit hierarchy of medical terms and their relationships by means of a coding system, enabling the identification of synonyms, hyponyms and hyperonyms. This makes it related to WordNet, although its coverage is very different. All the information in SNOMED is contained in UMLS, but represented in a more explicit tree-like structure. However, UMLS also includes information from a wide variety of other sources, and establishes relationships between these. UMLS further provides explicit morphological, syntactic and semantic information.

Relations to Notions of Lexical Semantics

As with the Higher-Level Ontologies discussed in the previous section, the structuring as synonyms, hyponyms and hyperonyms relate it to cognitive taxonomic models referred to in §2.7.

LE Uses

Both NLM and many other research groups and institutions are using UMLS in a variety of applications, including natural language processing, information extraction and retrieval, document classification, creation of medical data interfaces, etc. NLM itself uses it in several applications [UMLS97], including Internet Grateful Med, an assisted interactive retrieval system, SPECIALIST, an NLP system for processing biomedical information, and the NLM/AHCPR Large-Scale Vocabulary Test.

next up previous contents
Next: Lexicons for Machine-Translation Up: Lexical Semantic Resources Previous: Higher Level Ontologies
EAGLES Central Secretariat