next up previous contents
Next: Corpus encoding Up: About the documents of Previous: About the documents of

Corpus and text typology

Two complementary reports are presented for this task: one concerned with corpus typology (EAGLES, 1996e) and a second one dealing with text typology (EAGLES, 1996g).

The document on Corpus typology starts with definitions that aim at distinguishing what is a corpus from what should not be considered as such. Then several characteristics are considered as a basis for a distinction between a corpus and a special corpus: quantity, quality, simplicity and documentation. Finally, specific types of collections like spoken corpora, samples, sublanguage corpora, reference corpora, monitor corpora, parallel and comparable corpora are defined.

The document on Text typology, to be read in conjunction with the previous report, distinguishes internal and external criteria for the classification of texts included in a corpus, and discusses in detail notions such as topic and genre. In view of both types of criteria, a typology of texts is proposed. Topic and style are also extensively discussed as part linguistic internal criteria. Spoken language corpora are also addressed. Finally, an Appendix is included which shows the common features of different classification systems in European corpora. This Appendix should be read together with the main document.