next up previous contents
Next: Text Up: Corpus and computer Previous: Corpus and computer

Deprecated terms

  Words such as collection and archive refer to sets of texts that do not need to be selected, or do not need to be ordered, or the selection and/or ordering do not need to be on linguistic criteria. They are therefore quite unlike corpora.

Citations are individual instances of words in use and collections of these also have no claims to be corpora. The precise conditions for a valid sample size for a corpus are indeed under discussion --- see later --- but no-one concerned seriously with corpora has attempted to gather a collection of ciations and announce it as a corpus. What has happened is that owners of previously-gathered citation collections have tried to use them as a bridge between traditional practice --- particularly in lexicography --- and corpus-based work.

It is unhelpful to confuse categories in this way, and important to assert minimal criteria for use of the word `corpus'.



Converted into html by Alessandro Enea
Mon May 15 10:24:42 DFT 1995