Both external and internal criteria are therefore to be considered in the classification of texts. A typology of texts for inclusion in corpora cannot be based entirely on external or internal criteria. It is not feasible to classify texts entirely on internal evidence, nor is it desirable if we are to learn about the relation between linguistic and non-linguistic features. Similarly, in a classification system based purely on external evidence we would not necessarily group linguistically similar texts together but may impose divisions on texts which may not reflect linguistic differences.
Many internal and external criteria are likely to be broadly correlated, and it is not always straightforward to decide which phenomena are best treated by one or the other. To complicate the issue, there is an in-between category, which we term reflexive (Lyons, 1977). This is where a text talks about itself, and proposes its own classification. Reflexivity is a property of all languages, and is the basis of much of what is conventionally regarded as external criteria. The title page of a novel, for example, will contain the name of the author, and the date may appear on it. These will be accepted as external `facts', unless challenged. It may well also say ``A Novel'', which although more contentious is probably acceptable as a genre label. The existence of such proposals in the text itself is not incontrovertible evidence of the accuracy of the classification, but for the control of large corpora there is no practical alternative.
We therefore divide external criteria into two varieties:
Reflexive evidence should wherever possible be cross-checked against circumstantial evidence.
The example of `novel' above brings into focus another useful group of terms. In establishing a typology for texts in computer corpora, we should not forget categories of classification that are already in use. It is expedient to reuse terms that already are accepted as categorisers in literary criticism, rhetoric, etc., even when they cut across the hierarchical organisation of this typology. For example, a novel is a (E.22.214.171.124.1.) book and its audience is the general public (E.3.1.*.2.1.); it is literary-recreational (E.3.2.4.). Terms such as novel are asterisked to indicate that they are terms externally defined (TEDs).