Next: Component Technologies Up: Areas of Application Previous: Text Summarization

Natural Language Generation

Introduction

The problem of automatic production of natural language texts becomes more and more salient with the constantly increasing demand for production of technical documents in multiple languages; intelligent help and tutoring systems which are sensitive to the user's knowledge; and hypertext which adapts according to the user's goals, interests and prior knowledge, as well as to the presentation context. This section will outline the problems, stages and knowledge resources in natural language generation.

Survey

Natural Language Generation (NLG) systems produce language output (ranging from a single sentence to an entire document) from computer-accessible data usually encoded in a knowledge or data base. Often the input to a generator is a high-level communicative goal to be achieved by the system (which acts as a speaker or writer). During the generation process, this high-level goal is refined into more concrete goals which give rise to the generated utterance. Consequently, language generation can be regarded as a goal-driven process which aims at adequate communication with the reader/hearer, rather than as a process aimed entirely at the production of linguistically well-formed output.

Generation Sub-Tasks

In order to structure the generation task, most existing systems divide it into the following stages, which are often organised in a pipeline architecture:

Content Determination and Text Planning:: This stage involves decisions regarding the information which should be conveyed to the user (content determination) and the way this information should be rhetorically structured (text planning). Many systems perform these tasks simultaneously because often rhetorical goals determine what is relevant. Most text planners have hierarchically-organised plans and apply decomposition in a top-down fashion following AI planning techniques. However, some planning approaches (e.g., schemas [McK85], Hovy's structurer [Hov90]) rely on previously selected content - an assumption which has proved to be inadequate for some tasks (e.g., a flexible explanation facility [Par91,Moo90])
Surface realisation: : Involves generation of the individual sentences in a grammatically correct manner, e.g., agreement, reflexives, morphology.

However, it is worth mentioning that there is no agreement in the NLG community on the exact problems addressed in each one of these steps and they vary between the different approaches and systems.

Knowledge Sources

In order to make these complex choices, language generators need various knowledge resources:

discourse history - information about what has been presented so far. For instance, if a system maintains a list of previous explanations, then it can use this information to avoid repetitions, refer to already presented facts or draw parallels.
domain knowledge - taxonomy and knowledge of the domain to which the content of the generated utterance pertains.
user model - specification of the user's domain knowledge, plans, goals, beliefs, and interests.
grammar - a grammar of the target language which is used to generate linguistically correct utterances. Some of the grammars which have been used successfully in various NLG systems are: (i) unification grammars--Functional Unification Grammar [McK85], Functional Unification Formalism [McK90]; ( ii) Phrase Structure Grammars--Referent Grammar (GPSG with built-in referents) [Sig91], Augmented Phrase Structure Grammar [Sow84]; (iii) systemic grammar [Man83]; (iv) Tree-Adjoining Grammar [Jos87,Nik95]; (v) Generalised Augmented Transition Network Grammar [Sha82].
lexicon - a lexicon entry for each word, containing typical information like part of speech, inflection class, etc.

The formalism used to represent the input semantics also affects the generator's algorithms and its output. For instance, some surface realisation components expect a hierarchically structured input, while others use non-hierarchical representations. The latter solve the more general task where the message is almost free from any language commitments and the selection of all syntactically prominent elements is made both from conceptual and linguistic perspectives. Examples of different input formalisms are: hierarchy of logical forms [McK90], functional representation [Sig91], predicate calculus [McD83], SB-ONE (similar to KL-ONE) [Rei91], conceptual graphs [Nik95].

Related Areas and Techniques

Machine Translation (§4.1) , Text summarisation (§ 4.4).

Next: Component Technologies Up: Areas of Application Previous: Text Summarization

EAGLES Central Secretariat eagles@ilc.cnr.it