Here we will develop the argument that two central parameters of the classification of texts are better described using internal, or text-linguistic, rather than external, or sociocultural, criteria.
The parameters are topic and style. Attempts to analyse them in order to provide a basis for classification are usually expressed in terms of external criteria and are unsatisfactory on several counts. They tend to be unsupported by scientific evidence, and therefore unreplicable; unsupported also by wide general agreement, so that we see several different versions co-existing without good criteria for preferring one to the others. There is no control over the granularity of the classification, and not sufficient dimensions for the very large amount of possible cross-classification. The prospect of topic and style being suitable for attribute/value analysis are extremely remote, and unattainable in the present state of our knowledge.
External criteria follow distinctions and classifications that are already available in the culture, both of documents and speech events. They may not use all of a pre-existing classification, and they may use facts in the culture that are not traditionally used to classify texts, but the defining characteristic is that the criteria for an external criterion are in the culture and not in the language.
For example, consider the category of `quality newspaper'. This is an informal term that is mutually exclusive with `tabloid', but there may be other categories as well. In the UK, for example, price used to be an important distinction, but there is currently a price war going on and this is no longer a sound criterion -- a few years ago quality newspapers had no news on their front page, but only advertisements and notices. Tabloid newspapers use a sheet size that is approximately half that of quality newspapers. The editorial in a tabloid consists of a few pithy sentences on an issue, while in the quality papers the editorial can be a learned essay of two or three thousand words setting out the editor's view.
All these are external criteria. The point is that one does not look inside the papers at the kind of language they use in order to classify them, although this would be quite simple and rewarding. There is, for example, a well-known style of headline-writing in some of the tabloids, using short words and very few grammatical ones (Nude Peer in Sex Storm), which might form an internal criterion, but that is not considered in the tradition of classification.