Parallel corpora


A parallel corpus is a collection of texts, each of which is translated into one or more other languages than the original. The simplest case is where two languages only are involved: one of the corpora is an exact translation of the other. Some parallel corpora, however, exist in several languages. Also, the direction of the translation need not be constant, so that some texts in a parallel corpus may have been translated from language A to language B and others the other way around. The direction of the translation may not even be known.

Parallel corpora are objects of interest at present because of the opportunity offered to align original and translation and gain insights into the nature of translation. From this work it is hoped that tools to aid translation will be devised. Probabilistic machine translation systems can moreover be trained on such corpora. Parallel corpora are made in the business of communication in multilingual societies, such as the United Nations, Nato, the EU and officially bilingual countries such as Canada.