NOUNS
A DETAILED ABSTRACT.
This is a report on a two-phase study of the semantics and syntax of noun classification in Swahili. Phase I, the topic of the paper, is an investigation of the semantic structure of the noun classes, from a cognitive-semantic perspective. Data for this study include all the nouns from the Standard Swahili-English Dictionary (Johnson 1939), entered into a computer database and subcategorized according to over 75 semantic and morphological criteria. This report proposes a semantic analysis the upper levels.
Phase II of the paper will involve investigation of noun classification and grammatical agreement in contemporary discourse.
1. Introduction
Among systems of linguistic categorization, noun class systems (including systems of grammatical gender, as in German or Arabic) are usually defined as follows:
(a) all nouns in the language are divided into a small and closed set of classes, signalled by inflectional morphology;
(b) the class of a noun is obligatorily co-referenced on other elements in the sentence via grammatical agreement (see e.g. Dixon 1982; Craig 1986). The phenomenon of noun classification has long been of interest to linguists and anthropologists because understanding the basis for grouping nouns together as members of a class hints at a system of cognitive or cultural classification underlying the system of linguistic classification. However, the question what, if any, semantic principles can explain the groupings of nouns into classes in Bantu languages has been controversial. The received wisdom is that although some generalizations can be made, there is a lot of arbitrariness in these systems.
This report will suggest that the diagnosis of arbitriness rests on an overly restrictive definition of what `semantic coherence' means, and that a cognitive-semantic approach reveals more systematicity than might appear at first. This section is a report on research in progress on the semantics and syntax of the noun class system of Swahili. Swahili has a typical Bantu noun class system, but its status as a lingua franca has led to the assimilation of an unusually large number of loanwords from genetically unrelated languages, especially Omani Arabic, Persian, and various Indian languages (and more recently English). The need to accomodate nouns of foreign origin, some of which fit the phonological forms but not the semantic content associated with the various noun classes, has challenged the resources of the system (see Nurse and Hinnebusch 1993, chapter 3;). Thus Swahili is an interesting case study for looking at continuity and change in noun class systems.
The report is organized as follows. In section 2, the Swahili noun class system, discusses some earlier work on noun classes in this and related languages, and introduces the theoretical approach that is being used in the present study.
In section 3 an explained methodology of this research. Section 4 presents some results of the study:
Section 5 discusses the relationship between the methodology and the analysis and draw some general conclusions.
2. Noun classification in Swahili.
1. Outline of the Swahili noun class system Swahili, a member of the Sabaki subgroup of Northeast Coast Bantu, has a noun class system that is typical of Bantu languages. All nouns are divided into 11 classes. The class of a noun is signalled by (a) a pair of prefixes attached to the nominal stem, one for singular, one for plural; (b) a characteristic pattern of grammatical agreement, whereby possessive pronouns, demonstratives, verb subject and object prefixes, and other sentence elements coreferential with a noun are assigned a prefix that co-indexes the class of the noun, if it denotes an inanimate object. Sentence elements relating to nouns that denote animate beings are indexed by a special set of "animate concords", regardless of the prefix on the noun. The Table below shows the nominal and concordial prefixes associated with the various classes[1]: (1) Table 1. Swahili nominal and concordial prefixes (some morphophonemic alternations ignored).
Class Nominal Prefix Adjectival Pfx Pronominal Pfx (tradi- (affixed to (affixed to (affixed to tional `fixed class' `variable V stems as Bantu stems class'stems) Subj/Obj; numbering) to Dem.Pro., Poss.Pro.&c.)
1 m- m- a-/m-; yu-; w-;ye- (depends
on stem)
2 wa- wa- wa-
3 m- m- u-
4 mi- mi- i-
5 zero orji- zero or ji- li-
6 ma- ma- ya-
7 ki- ki- ki-
8 vi- vi- vi-
9 zero or n- zero orn- i-
10 zero
orn- zero or n- zi-
11/14 u- m- u-
Inspection of Table 1 shows that noun class cannot be determined solely from the form of the noun: the prefixes for Classes 1 and 3 (m- in both cases) are homonymous; this is also often true of Classes 9, 10, and 5, where the noun may have no prefix at all. The agreement prefixes also show some homonymy. Therefore the definition of `noun class' in Swahili normally involves reference both to the prefix on the noun (if there is one) and to the pattern of grammatical agreement. 2.2. Earlier treatments of the noun classes As mentioned above, noun classes in Bantu languages are defined in part by the formal marking of the noun (its class prefix), and in part by the association between a set of nouns on the one hand, and a set of `agreement markers' affixed to possessive pronouns, verb stems, etc., on the other.
Although there is a wide range of opinions about whether the noun classes in Swahili and other Bantu languages have semantic content, there is great uniformity on the treatment of grammatical agreement. Agreement is assumed to be a purely syntactic phenomenon, in which the grammatical properties of one element in the sentence (the agreeing element, or `target', in the terminology of Corbett 1991) are determined by those of another element (the `controller', in this case the noun). In other words, it is assumed that `agreement' morphology contributes no independent semantic content to the message being communicated, but is merely a mechanical copying of features of the `controller' onto the `target'. Implicit in this view is the further assumption that `controllers', such as noun stems, `have' fixed grammatical properties (e.g. membership in a particular noun class); without this assumption one could hardly speak of copying features from the noun to, say, a coreferential demonstrative pronoun. Given this prevailing view of concord, the question of meaningfulness of the noun classes in Bantu languages has only been raised with respect to the prefixes attached to noun stems. The question that has been addressed is, can any regular semantic principles be identified to explain the assignment of noun stems to classes? Answers to this question, perhaps predictably, range from no to yes, but the majority opinion lies somewhere in between. Perhaps the most extreme position on the `no' side is that of Irvine Richardson (1967:378), whose assertion that `...it is impossible to prove conclusively by any reputable methodology that nominal classification in Proto-Bantu was indeed widely based on conceptual implication...' is widely quoted, especially by those who disagree with it. At the other end of the spectrum are those who have tried to define each noun class in terms of a single abstract meaning, such as Denny and Creider's (1976) analysis of Proto-Bantu, and Zawawi's (1979) analysis of noun classes in Swahili. Denny and Creider argue that Proto-Bantu had two subsystems of categorization, with partially overlapping morphology, one for count nouns and one for mass nouns. The count noun categories are further subdivided into `kind' classes, that identify objects as animate vs. artifact, and `spatial configuration' classes, that subclassify objects according to shape. The mass noun categories make a distinction between `cohesive' (substances that stick together) and `dispersive' (substances composed of dry particles that are readily dispersed). In the same spirit, though differing in content, Zawawi (1979) assigns a single invariant meaning to each noun class prefix in Swahili. Zawawi's analysis is innovative: she abandons the traditional criteria for definition of the classes, pointing out correctly that they are often inconsistent, and even groups together prefixes traditionally treated as homonyms. According to her analysis, the singular classes subclassify nouns as `substance of life' (m-, traditionally divided into Classes 1 and 3), `substance of abstractness' (u-, traditionally defined as a merger of Classes 11 and 14), `comparison of size or manner' (ki-, Class 7), `intensification' (zero ~ ji-, Class 5), `large' (ba-, not a traditional Bantu class), and a residual `catch-all' category (zero ~ n-, Classes 9/10). Unfortunately, analyses of the noun classes in terms of invariant meaning have failed to convince the skeptics, because there are always some examples that conflict with the invariant meanings that are posited. For instance, Denny and Creider's definition of Proto-Bantu Class 3 as `solid, extended in one dimension' does not seem to cover terms like *-dImu `ancestral spirit'[5], *-tIma `heart', *-yedI `moon', *-gUba `bellows', *-dIgo `load', *-yInci `daylight', etc. And Zawawi's definition of m- as `substance of life, singular' does not cover terms like mkufu `metal chain', mji `town', mfumbi `irrigation ditch', mlia `stripe', deverbal nouns, etc. The middle-of-the-road position on the semantics of the noun classes is to divide the noun classes into two subsets: a `derived' set of classes, assumed to be meaningful, to which noun stems from any class can be freely assigned with predictable effects on meaning, and an `inherent' set of classes, whose membership is largely arbitrary. Formally, these sets overlap: the same morphology is used for both `inherent' and `derived' class. The `derived' classes include the diminutive classes (with ki-/vi- prefixes, homonymous with the prefixes of Classes 7/8) and the augmentative classes (with zero/ma- prefixes, homonymous with the prefixes of Classes 5/6). With respect to the `inherent' classes, it is usually argued that although some semantic generalizations can be made about the groupings of nouns into classes, there is also a great deal of arbitrariness. It is often surmised that the present, disorganized system is a breakdown of an earlier, more coherent system assumed to have existed in the ancestor language. Examples of studies employing the `inherent/derived' distinction are Givon (1972), for ChiBemba; Heine (1982), who uses the terms `free' vs. `fixed' gender; Reynolds (1989), Reynolds and Eastman (1989), and Nurse and Hinnebusch (1993) for Swahili. Although it is useful to distinguish between productive and non-productive processes of noun formation, the `inherent/derived' distinction ignores the question whether there are any semantic regularities in `inherent' classes, and also ignores semantic relationships, if any, between `inherent' and `derived' class markers. Rarely is any attempt made to connect the various groupings of nouns in a given class with one another, to investigate systematic relationships among different classes, or to explain the exceptions to the generalizations (such as names of animals that are are not in the `animal' class). Also, the claim that the modern languages represent a breakdown of an earlier, more coherent system that used to exist in the ancestor language is basically a myth. The Proto-Bantu noun class system also had many apparent anomalies-- if it didn't, there would be no controversy about whether the Proto-Bantu noun classes were meaningful (see also Herbert 1985). In fact, claims about a mythical, semantically transparent system assumed to have existed in an ancestor language are commonplace in discussions of noun categorization, not only in Bantu (cf. Meillet 1923 on gender in Indo-European), yet no modern noun class language is attested with such a transparent system. It seems implausible to attribute a property to an ancestral language that is not found in any language of which we have direct knowledge. But if noun class systems are so full of anomalies, why do they persist for so long essentially intact (in the case of Bantu, some 3000 years or more)? The problem seems to lie not with the languages, but with the assumptions about the nature of linguistic categorization that are brought to bear on this question. It seems to be assumed that either noun class semantics must be defined in terms of a set of common properties shared by all nouns in a given class, or one must abandon the search for semantic coherence and settle for a heterogeneous list. As pointed out by Lakoff (1987), this assumption is based on a view of linguistic categories as equivalent to sets in Aristotelian logic, which must be defined in terms of a set of necessary and sufficient conditions for membership. This view of linguistic categorization has been widely challenged in recent years, especially from the point of view of Cognitive Grammar (cf. Lakoff 1987; Langacker 1987, 1990; Rudzka-Ostyn 1988). It has been argued that membership in a given linguistic category (for example, a noun class) may be based on multiple criteria, including `family resemblances', metaphor, and metonymy, and that linguistic categories may exhibit an internal structure in which some members of the category are more central, or prototypical, and others are more peripheral. Although work within Cognitive Grammar has tended to concentrate on the semantics of individual lexical items, there are some detailed and illuminating studies of noun categorization that make use of similar insights: the work of Zubin and Koepcke on gender in German (Zubin and Koepcke 1986a, 1986b) and the work of Spitulnik on noun classes in ChiBemba (Spitulnik 1987, 1989). In sum, rather than treating noun class systems as degenerate reflections of an earlier, vanished coherence, we should broaden our conception of `coherence'. From a cognitive-semantic viewpoint even the synchronic systems of the modern languages can be shown to make sense. 3. Methodology This project is being conducted in two phases. In phase I (almost completed), a database is being compiled of all nouns listed in the Standard Swahili-English Dictionary (Johnson 1939, henceforth SSED), using a commercial database program, DBase IV. Phase II of the project will involve investigation of contemporary usage of the noun class system in connected discourse. The two phases of the project are described in the next two subsections. 3.1 The noun database As mentioned above, all nouns from the SSED are being entered into a database. So far 4784 nouns have been entered, which includes relatively complete coverage for all classes. Each noun in the database is subcategorized according to a combination of morphological and semantic criteria. The morphological criteria are: (a) noun class affiliation, using the traditional Bantu numbering system; (b) if the noun is derived, the source of the derivation (e.g. verb stem, adjective stem, etc.). Most of the categories of the database are semantic, since this is the purpose of the enterprise. For each noun I have included its dictionary definition, as well as classifying its meaning according to several semantic categories, each constituting a separate `field' of the database. The major categories are HUMAN, ANIMAL, PLANT, SHAPE, SIZE, AFFECT, FORCE OF NATURE (such as wind, rain, etc.), and NUMBER[8]. Within each of these categories further, more specific information is provided. For example, within the field HUMAN the nouns are further subclassified into agentive (denoting the agent of an action), kinship term, religious (e.g. `prophet', `saint'), occupation (e.g. `tailor'), etc. (To browse through the database structure in detail, click on the word database. To see a sample of the database with nouns tagged, click on the word sample.) Use of a database has both advantages and disadvantages. The obvious advantage is that the database makes it possible to store and manipulate very large amounts of data, and to sort it in any number of different ways. Thus a few keystrokes can generate a list of all the nouns in Class 5, all nouns referring to animals, all large three-dimensional hollow objects in Class 9, etc. The database can also be used for other purposes besides those for which it was originally designed. A dictionary is a kind of `culture inventory': just looking at which semantic areas are highly differentiated and which are not yields insight into the interests and preoccupations of the speakers of a language. Disadvantages of this research method are both practical and theoretical. On the practical side, entering all the nouns from a dictionary onto a database is obviously very time consuming. But one could also regard this as an advantage: reading the dictionary does allow (or force) one to become intimately familiar with the data. A second practical problem is how to avoid entering redundant records. For example, the SSED sometimes lists derived nouns both as separate entries and as sub-entries under the source of the derivation. This problem was avoided by writing a program for DBase IV that would automatically scan the database for homographic entries each time a new noun was entered, and display all previous examples of the relevant form; new homographs were only entered in cases of homonymy. The database project also raises some theoretical issues. First and most important, the problem of the semantic categories used to tag the nouns. In order to create a database, one has to anticipate which classificatory categories will be useful before entering the data, in a way guessing at the very analysis that the tool is intended to help discover. Use of a bilingual dictionary potentially adds to this problem, by introducing (or imposing) semantic categories of English that may or may not be relevant to Swahili. How can I be sure that I am not just projecting English-based categories onto Swahili? The short answer is, of course, that there is no general way to insure against this. It is the familiar problem of working from the `etic' to the `emic' (in the terminology of Pike 1967). In practice, I tried to minimize the problem by drawing on previous work on noun classification in Bantu languages, especially Denny and Creider (1976), Zawawi (1979), and Spitulnik (1987, 1989), as well as cross-linguistic studies of noun categorization, for example Adams and Conklin (1973), Craig (1986). Even so, I found it necessary to modify the database in various ways as I went along. Several tags were added or replaced during the data entry. Of course, modifying the tags for previously entered records is very laborious, but this procedure does permit a kind of dialogic relationship between the data and the tagging process. A second problem, or set of problems, comes from the use of a dictionary as data source. A dictionary is an analysis, not just a description. The compilers make choices about which words to include, how many entries to make for a given form (the familiar problems of polysemy and homonymy), and how to deal with geographic and social variation in pronunciation, grammar, meaning, and usage. Without doing extensive archival research, the user of the dictionary has no way of knowing exactly whose language is represented in it. Moreover, a dictionary is a sociolinguistic act (cf. Hymes 1974 and even more to the point, Fabian 1986). It is produced by people with a certain socio-cultural background, for a certain intended audience, and with certain goals in mind. The SSED, for example, was compiled by a British colonial committee constituted in 1930, composed in part of Christian missionaries and intended for an English-speaking colonial audience (for details, see Whiteley 1969). Without intending criticism of the compilers, who supply a wealth of cultural information about the vocabulary, no dictionary can be free of ethnocentric bias. It is not hard to find obvious examples, e.g. several varieties of fish defined only as `fish, not considered good eating by Europeans'. Although examples like this are not typical of the dictionary as a whole, it is still inevitable that in defining a Swahili term for an English-speaking audience of European cultural background, points regarded as worthy of comment or elaboration would be those where there is a perceived contrast between Swahili language/culture and that of English-speaking Europeans. In any case, the only way of compensating for ethnocentrism in the dictionary is to learn as much as possible about the language and culture from other sources as well. A second important problem with the SSED as data source derives from the compilers' goal of standardizing the language. Because dictionaries usually have a prescriptive function as well as a descriptive one, it is hard to determine how much variation in the language is being concealed in order to encourage uniformity of usage. For example, from my knowledge of the contemporary language, I expected to find a fair amount of variation in noun class assignment, especially between Classes 5 and 9, both of which contain large numbers of loanwords, and both of which have zero as the most frequent allomorph of the noun class prefix. However, only about 3% of the nouns in the database were listed in the dictionary as variable in noun class membership. Has the situation changed since the 1930's, or were the dictionary makers trying to impose conformity on the data? There is no way to tell. Also, even for those nouns that are listed as variable, no information is given about the nature of the variation: does it stem from dialect variation? variation among individual speakers? variation based on discourse context? Again, there is no way to tell. For the reasons just outlined, it is desirable to supplement the dictionary material with a wider range of data, especially data from contemporary discourse. Discourse data is important because that is the place to look for the areas of uncertainly and variability in meaning and usage that are represented only sporadically in the SSED. Also, this is the place to find neologisms, loanwords, slang, and other innovative usages that may or may not make it into a dictionary. Looking at how these `uncodified' words interact with the noun class/agreement system should shed light on the semantic reality of the noun classes themselves, and on the semantic functions of the agreement system. This is the plan for Phase II of the project. 3.2 Investigation of noun classes in discourse For Phase II of the project, I plan to use the electronic corpus of Swahili texts that is currently being compiled by the Department of Asian and African Studies of the University of Helsinki, Finland together with the Institute for Kiswahili Research of the University of Dar es Salaam, Tanzania. The corpus, housed at Helsinki, contains prose texts in Standard Swahili, from books and newspapers, and transcriptions of folkloristic material. The texts have not been coded for morphological and syntactic information, but some text retrieving programs are available, which produce concordances with context ranging from a line to a sentence in length[9]. I already have a list of all wordforms in the Helsinki corpus that are not included in the dictionary compiled by the Institute for Kiswahili Research (Taasisi 1981), with their sentential contexts, for which I profusely thank Arvi Hurskainen. This list allows a look at the syntactic behavior of nouns that have not yet been `codified'. A preliminary scan of the data has already uncovered some interesting examples of agreement with conjoined noun phrases, treatment of acronyms, and nouns with variable agreement patterns, which will be the subject of a later paper. 4. Preliminary Results: Analysis of Classes 3, 7, 5, 9, and 11/14 The Swahili noun classes have differing degrees of internal coherence in their semantic structure. At one extreme are Classes 1/2, often called the `human' classes, whose membership consists almost entirely of nouns denoting human beings, especially agents of actions[10]. At the other extreme are Classes 9/10, which have absorbed the majority of foreign loanwords, and may already have been fairly heterogeneous even before the major influx of loans from Omani Arabic, dating only from the 17th century (N + H 1993:320). In this section I will propose an analysis of the classes that fall in the middle, the ones that constitute the nucleus of the system. 4.1 Class 3 Figure 1 is a schematic representation of the semantic structure of Class 3. The top-to-bottom organization of the diagram moves from the more general to the more specific, but the diagram is not intended to be a `taxonomy' in the technical sense (cf. Casson 1981:75-77). I have borrowed the conventions used by Langacker (1988) for the representation of a linguistic category. Langacker defines two basic types of semantic relationship among the elements in a category: (a) relations of `schematicity', in which one element is an `elaboration' or `instantiation' of another, more abstract element (represented by solid lines in the diagram); (b) relations of `extension', in which some feature specifications are suspended or modified, while other features are retained (represented by dotted lines in the diagram). Examples of nouns in each category [11] may be viewed by clicking on the relevant area of the diagram[12]. This analysis incorporates many of the insights and observations of earlier studies of Swahili, such as Ashton (1944:23), Polome (1967:97), Hinnebusch (1979:230), Zawawi (1979:chapter 5); Meinhof (1948 [1906]:28-64) anticipated many of these in his comparative grammar of Bantu. What I have tried to do here is make explicit the connections among the various semantic categories that had been identified by others, and add some categories that have not been mentioned before. The topmost category in the chart, `entities with vitality', is what Langacker (1988) calls a `superschema', a maximally abstract category that holds together the various subcategories. `Vitality' is meant to capture various attributes of living beings, including growing and reproducing (true of plants, and metaphorically of human collectivities), but also ability to move (active body parts), to act on or affect other entities in the world or to occur independently of human volition (supernatural and natural phenomena). The categories `exceptional animals' and `human collectivities' require some additional comments. These categories appear to be in this class not only because of properties that they share with other members of the class, but also because of the opposition between this class and other classes in the Swahili noun class system. First, the animals. In Swahili, as in other Bantu languages, most terms denoting animals are in Classes 9/10. These can be thought of as the `default' classes for animals, especially mammals, virtually all of which are in 9/10 in Swahili. All animal terms that are not in 9/10 are therefore exceptional in some way. If one looks at the distribution of animal terms outside 9/10, the situation is as follows: (a) those in Classes 1/2 (the `human' classes) are either generic terms for whole groups of animal species, agentive nouns, or terms derived from Class 1 nouns (already pointed out in note 8); (b) those in Classes 7/8 denote small animals (see chart of Class 7, below); (c) those in Classes 5/6 denote either animals that are large for their type (such as kunguru `carrion crow', panzi `grasshopper', moma `puff adder'), or non-mammals (e.g. kaya `kind of shellfish', tekenya `jigger, burrowing flea'). The animal terms that are in Class 3 are unusual because they do not fit easily into established categories, either because of their appearance (swordfish), their behavior (kingfisher, Golden Weaver finch, termite, cuttle-fish), or a combination of these (the eel is like a snake, but also like a fish; leeches and intestinal worms have an unusually intimate relationship with the human body-- and mjiko `lower bowel' is also in this class, so metonymy could also be operating here). The case of human collectivities is somewhat similar. These are entities that include human beings, but are not themselves human, so they fall somewhere between animate and inanimate. Class 3 is a compromise: it has a human-like prefix (m-), but a non-animate agreement pattern. Among all the subcategories of Class 3, that of plants/trees can be regarded as the most central, for several reasons. First, this subcategory contains the largest number of terms (almost half of all the nouns in the class). Second, it is productive in two ways: loanwords denoting trees and plants are almost always assigned to this class regardless whether the term in the source language had an initial m-; trees and plants form the model for the majority of metaphoric and metonymic extensions in Class 3. The subcategories below the second level on the diagram are interrelated in several ways: many terms fit with equal ease into the subcategories `extended things', `active things', `extended parts of things', and `objects made of plants'. My objective here is to suggest plausible avenues for semantic extensions; these need not be mutually exclusive. To the extent that a term fits into more than one category, it can also be regarded as well-entrenched within the semantic network of Class 3. The most salient aspect of trees and plants, from the point of view of the subcategories associated with them, is shape, i.e. extendedness in one dimension[13]. This physical attribute is the basis for the inclusion in this class of some inanimate objects not made of plants, such as `nail', `ramrod', `metal chain', and of extended body parts of humans and animals (`bone', `blood vessels', `sinew', `porcupine spine', etc.). The inclusion of objects made of plants (`wooden platter', `straw mat', etc.), on the other hand, is based on metonymy. The inclusion of long body parts in Class 3 motivates a further extension, by metonymy, to coverings that are wrapped around the body. This seems to be a recent extension within Swahili: none of the nouns in this group are reconstructed for Proto-Sabaki or Proto-Swahili by N + H, and some of them are loanwords from Arabic (but significantly, these did not originally begin with m-). The category `powerful things' includes inanimate objects that have effects on human beings, such as substances with curative properties or religious objects. Some of these are made of plants, so they are connected both to this and to the category `supernatural phenomena', since they derive their power from some agency other than human. `Active things' are things, especially tools, that have movement as a salient characteristic: `arrow', `pestle', `chopper', `loom pedal', etc. In contrast to the `powerful things', they must be set in motion by a human agent. In this way they are similar to the `active body parts', which move but do not have independent volition. Perhaps the most abstract distillation of the `entities with vitality' category is the use of Class 3 to derive deverbal nouns referring to the verbal process itself, such as mparuro `a scratching' (from -parura `to scratch'), mfuo `a hammering' (from -fua `to hammer'), mlio `a sound' (from -lia `to make a sound'), etc. Such nouns describe a process as a thing, and so fit well with the other liminal entities in this class, that fall somewhere between animate and inanimate. 4.2 Class 7 Here is the proposed network for Class 7. For each node in the diagram, a representative list of nouns may be viewed by clicking on that portion of the diagram. Denny and Creider (1986 [1976]:223) state that the `primary meaning [of Proto-Bantu Class 7] is instrumental artifact'. If one interprets `primary meaning' as `prototypical meaning', I think this is right, and that it is still true of Swahili. I have added to this the specification `small enough to hold in the hand', because this applies to the majority of terms for instrumental artifacts in Class 7, and it provides a motivation for the major semantic extension within this class, to `small entities in general', not all of which are instrumental artifacts. Among `small entities' there are several subgroups, most of which are self-explanatory. I will comment on the ones that are less obvious. First, the category `pieces/parts of things'. Parts or subdivisions of things are smaller than the whole, so this category includes both reference to size and an implicit comparison between the part and the whole (recall Zawawi's 1979:115 definition of this class as `comparison of size or manner'). This part-whole comparison is carried over into a further extension, to `shortened things', that is things that have been truncated through being worn down or cut, and to terms for people with physical defects, conceived as not-whole. These latter terms generally have derogatory connotations in Swahili which, not coincidentally, has a single adjective for `whole', `healthy', and `adult/mature', i.e. -zima. As pointed out by Denny and Creider (ibid.), `it is a fairly natural extension from `used object' to `despised object''. I am arguing here that the metaphor of size plays a role in this extension in Swahili. The salient characteristic of terms in the subcategory `pointed things' is that a point or angle occupies a small amount of space. Even if the whole is large, such as a mountain, the pointed part (`peak') is relatively small. In the case of pointed parts, too, there is an implicit comparison between part and whole. The category `part of substance' is more abstract than the ones just discussed, but its connection to part-whole relations is nonetheless apparent. This category includes terms denoting subdivisions of time and space. Height, depth, and units of measurement divide and delimit potentially extended spaces or spans of time into measurable parts. Here, too, there is an implicit comparison between the measured entity and the undelimited remainder[14]. Swahili grammars often point out that the prefix ki- is used to derive `adverbs of manner' (cf. Ashton 1944:165; Polome 1967:100), but they make no connection between this `function' and the noun class meaning of ki-. Zawawi's (1979:115) suggestive definition for the Class 7 prefix ki-, `comparison of size or manner', is the first to connect these ideas explicitly, but Zawawi does not explain how they are connected. I believe the link lies in a metaphorical extension of the part-whole relation to qualities or attributes, in a way reminiscent of the English expression `a chip off the old block'. The relationship of similarity can be thought of as a partial overlap in substance between the entities that are regarded as similar. Thus a very sweet banana overlaps in part with the sweetness of sugar; a butterfly that flies slowly partakes, so to speak, in the quality of slowness. The same rationale can be extended to deverbal nouns denoting human beings who habitually perform the action expressed by the verb. Finally, the terms in the category `ailments associated with body parts' are all connected to their respective body parts by metonymic extension. Although some of them are actually associated with small body parts (`inflammation of the eyelid'), one could easily imagine this category being extended to other body parts via metonymy. The above categories account for over 90% of the Class 7 nouns in the database. Of the rest, in some cases not enough information is provided in the dictionary gloss to decide whether they fit or not (for example, kilendo `kind of fish'; kilua `kind of sweet smelling flower'-- are they small?). Some may be deverbal nouns or `similarity' nouns whose source is now obsolete (e.g. kisutuo `food received after a task has been completed'; kifabakazi `Nandi flame tree'). Some are loanwords that had ki- as initial syllable, and so may have been placed in this class for phonological reasons. There is one set of apparent anomalies that deserves additional comment, however. This is a group of terms referring to large, dangerous animals or birds: kifaru `rhinoceros', kiboko `hippopotamus', kingugwa `large spotted hyena', kipungu `eagle', and kipanga `Dickinson's falcon'. These terms are strikingly anomalous: why should large, predatory animals be placed in a class whose most prominent characteristic is small size, often with a connotation of insignificance? Interestingly, three of these terms (`hippopotamus', `rhinoceros', and `eagle') are replacements for terms that were originally in Class 9 (the `animal' class) in Proto-Sabaki (N + H 296). In other words, these three animals were moved from the `animal' class to Class 7 (kipanga `Dickinson's falcon' was already in Class 7 in PSA, and N + H do not give a reconstructed form for `spotted hyena'). One possible explanation is that these terms started out as euphemisms. Putting names of large, dangerous animals in the class of small, manipulable things could be a way of figuratively neutralizing or diminishing their power[15]. 4.2 Class 7 Here is the proposed network for Class 7. For each node in the diagram, a representative list of nouns may be viewed by clicking on that portion of the diagram. Denny and Creider (1986 [1976]:223) state that the `primary meaning [of Proto-Bantu Class 7] is instrumental artifact'. If one interprets `primary meaning' as `prototypical meaning', I think this is right, and that it is still true of Swahili. I have added to this the specification `small enough to hold in the hand', because this applies to the majority of terms for instrumental artifacts in Class 7, and it provides a motivation for the major semantic extension within this class, to `small entities in general', not all of which are instrumental artifacts. Among `small entities' there are several subgroups, most of which are self-explanatory. I will comment on the ones that are less obvious. First, the category `pieces/parts of things'. Parts or subdivisions of things are smaller than the whole, so this category includes both reference to size and an implicit comparison between the part and the whole (recall Zawawi's 1979:115 definition of this class as `comparison of size or manner'). This part-whole comparison is carried over into a further extension, to `shortened things', that is things that have been truncated through being worn down or cut, and to terms for people with physical defects, conceived as not-whole. These latter terms generally have derogatory connotations in Swahili which, not coincidentally, has a single adjective for `whole', `healthy', and `adult/mature', i.e. -zima. As pointed out by Denny and Creider (ibid.), `it is a fairly natural extension from `used object' to `despised object''. I am arguing here that the metaphor of size plays a role in this extension in Swahili. The salient characteristic of terms in the subcategory `pointed things' is that a point or angle occupies a small amount of space. Even if the whole is large, such as a mountain, the pointed part (`peak') is relatively small. In the case of pointed parts, too, there is an implicit comparison between part and whole. The category `part of substance' is more abstract than the ones just discussed, but its connection to part-whole relations is nonetheless apparent. This category includes terms denoting subdivisions of time and space. Height, depth, and units of measurement divide and delimit potentially extended spaces or spans of time into measurable parts. Here, too, there is an implicit comparison between the measured entity and the undelimited remainder[14]. Swahili grammars often point out that the prefix ki- is used to derive `adverbs of manner' (cf. Ashton 1944:165; Polome 1967:100), but they make no connection between this `function' and the noun class meaning of ki-. Zawawi's (1979:115) suggestive definition for the Class 7 prefix ki-, `comparison of size or manner', is the first to connect these ideas explicitly, but Zawawi does not explain how they are connected. I believe the link lies in a metaphorical extension of the part-whole relation to qualities or attributes, in a way reminiscent of the English expression `a chip off the old block'. The relationship of similarity can be thought of as a partial overlap in substance between the entities that are regarded as similar. Thus a very sweet banana overlaps in part with the sweetness of sugar; a butterfly that flies slowly partakes, so to speak, in the quality of slowness. The same rationale can be extended to deverbal nouns denoting human beings who habitually perform the action expressed by the verb. Finally, the terms in the category `ailments associated with body parts' are all connected to their respective body parts by metonymic extension. Although some of them are actually associated with small body parts (`inflammation of the eyelid'), one could easily imagine this category being extended to other body parts via metonymy. The above categories account for over 90% of the Class 7 nouns in the database. Of the rest, in some cases not enough information is provided in the dictionary gloss to decide whether they fit or not (for example, kilendo `kind of fish'; kilua `kind of sweet smelling flower'-- are they small?). Some may be deverbal nouns or `similarity' nouns whose source is now obsolete (e.g. kisutuo `food received after a task has been completed'; kifabakazi `Nandi flame tree'). Some are loanwords that had ki- as initial syllable, and so may have been placed in this class for phonological reasons. There is one set of apparent anomalies that deserves additional comment, however. This is a group of terms referring to large, dangerous animals or birds: kifaru `rhinoceros', kiboko `hippopotamus', kingugwa `large spotted hyena', kipungu `eagle', and kipanga `Dickinson's falcon'. These terms are strikingly anomalous: why should large, predatory animals be placed in a class whose most prominent characteristic is small size, often with a connotation of insignificance? Interestingly, three of these terms (`hippopotamus', `rhinoceros', and `eagle') are replacements for terms that were originally in Class 9 (the `animal' class) in Proto-Sabaki (N + H 296). In other words, these three animals were moved from the `animal' class to Class 7 (kipanga `Dickinson's falcon' was already in Class 7 in PSA, and N + H do not give a reconstructed form for `spotted hyena'). One possible explanation is that these terms started out as euphemisms. Putting names of large, dangerous animals in the class of small, manipulable things could be a way of figuratively neutralizing or diminishing their power. 4.3 Classes 5 and 9 For reasons that will become apparent, it is useful to discuss Class 5 together with Class 9. First, a few remarks about morphology. Historically, nouns of Class 5 used to have a distinctive prefix, reconstructed as phonologically conditioned allomorphs *jI and *I for Proto-Sabaki by N&H (p. 338). However, in most Swahili dialects the prefix (now j- or ji-) has been retained only before vowel-initial and monosyllabic noun stems, which are comparatively infrequent, and indeed it is recognizable as a prefix only with the monosyllabic stems. Thus most nouns of Class 5 have a zero prefix, and in this respect they are indistinguishable from most nouns of Class 9 (cf. Table 1 in Section 2.1 above). The grammatical distinction between Classes 5 and 9 is maintained not by differences in the form of the noun, but by differences in pluralization and agreement patterns: Class 5 nouns have the prefix ma- in the plural, whereas those of Class 9 do not change in the plural; the pronominal agreement for Class 5 is li-, and that for Class 9 is i-. However, even these criteria for distinguishing between Classes 5 and 9 collapse in the case of nouns denoting animate beings. In Swahili there is a special set of agreement markers, called "animate agreements", that are used with all animate nouns, regardless of their class prefix. Because of this the animate nouns of Classes 5 and 9 are distinguished neither by their prefix (usually zero in both cases) nor by the associated agreements. The tenuous nature of the 5/9 distinction for animates is reflected in a breakdown of the difference in pluralization: many animate nouns with zero prefix may either have ma- or remain invariable in the plural (e.g. rafiki 'friend', plural either marafiki or rafiki). Because the class affiliation of prefixless nouns denoting animates is ambiguous at best, I will leave these aside in my discussion of the semantic structure of Class 5. The fact that Classes 5 and 9 most commonly have a zero prefix might lead one to expect that these classes would be especially hospitable to loanwords, particularly words whose initial syllable does not resemble a recognizable class prefix, a point that has been made by several Swahili scholars. Zawawi (1979:127) suggests that such nouns may first be incorporated into Class 9, and may later be recategorized as Class 5 (the latter has the advantage of distinguishing singular from plural). Eastman (1991:61ff.) argues the reverse: that loanwords start out in Class 5 and are later recategorized as 9. N&H (1993) do not argue for one direction over the other, but point out that loanwords often fluctuate between these two classes (p. 355). They also suggest that Classes 5 and 9 have received more loanwords than the other noun classes (p. 309), and that the semantic structures of both 5 and 9 have been equally distorted, resulting in classes that function as semantic "catchalls" (p. 320). From a communicative point of view, such an outcome would seem to be inefficient: whereas it would be useful to have a single semantically miscellaneous class to serve as home for nouns that do not readily fit into any of the other classes, it seems superfluous, indeed confusing, to have two such classes. And in fact, the data from the Johnson dictionary show a different picture. First, the numbers. Here is a breakdown of the percentages of loanwords in the various classes in my database (using only nouns whose class assignment is unambiguous in Johnson 1939, and ignoring class markers used exclusively to form plurals): Table 2. Allocation of loanwords to noun classes. Class Prefix Total # Nouns Total # Loans % Loans 1 m- 334 75 22.5% 3 m- 852 167 19.6% 5 zero 656 166 25.3% 6 ma- 136 22 16.2% 7 ki- 657 55 8.4% 9 zero 1277 690 54% 11 u- 260 73 28.1% [Data from Johnson 1939] It turns out that the proportion of loanwords in Class 5 is approximately equal to that of Classes 1 and 11, both of which have identifiable prefixes. On the other hand, over half the words in Class 9 are loanwords, a far greater proportion than any other class. These numbers support Zawawi's contention that Class 9 "has now become an open class hosting all those words that come into the language which are not marked semantically or syntactically" (1979:134). Further support for the claim that Class 9 functions as a residual, semantically miscellaneous category can be found in discourse data. For one thing, "nonce" nouns, i.e. words or expressions that would not normally be classed as nouns but are being treated grammatically as entities in special contexts, are associated with the grammatical agreements of Class 9. For example: 'Karibu!' Najum alimsikia Bwana Msa akijibu 'Hodi' y-ake. [Abdulla 1968:7] 'Come in!' Najum heard Bwana Msa answering his 'hello'. Here hodi, a greeting used to announce one's presence, is being treated as a possessed entity; the 3rd person singular possessive pronoun -ake is marked by (the prevocalic allomorph of) the agreement prefix of Class 9. Similarly, "nonce borrowings", i.e. lexical items from another language that have not (yet) been used with sufficient frequency to be regarded as established loans (cf. Poplack, Sankoff, and Miller 1988), are associated with Class 9 agreements if they denote inanimate objects, unless there are semantic or phonological reasons for putting them in another class [18]. In a list of examples containing wordforms that do not appear in the Taasisi (1981) dictionary, from the Helsinki electronic corpus of Standard Swahili books and newspapers, the vast majority are treated as Class 9 [19]. Examples include terms like probationary appointment, aluminium sulphate, land rover, knock out (in boxing), kansa (=cancer), katalogi, abbreviations such as B.B.C., A.N.C., acronyms such as NASACO, etc. Given the miscellaneous nature of Class 9, there would be little point in trying to create a semantic network for it. No doubt there are semantic regularities in the inherited vocabulary of Class 9, but there is no reason to expect these to extend to the loanwords that by now outnumber the inherited words, or to the nonce forms just discussed. However, the designation of Class 9 as the residual member of the class system should not be interpreted as a breakdown of the semantic coherence of the system as a whole-- quite the contrary. It is the very existence of Class 9 that helps maintain the coherence of the rest of the system despite the challenge of assimilating loanwords into Swahili grammar. Now on, or back, to Class 5. Denny and Creider (1976) include Proto-Bantu Class 5 in the set of "configurational" classes, i.e. "prefixes which classify according to the spatial configuration of the objects classified" (p. 3). Within this group they oppose 5 to 9, the former defined as "solid shape", the latter as "outline shape", but both sharing the feature "non-extended" (i.e. "rounded, protruded, bunched, humped, etc.", p. 5). In Swahili it appears that 3-dimensionality is indeed a salient aspect of Class 5, and a large number of the terms denoting solid 3-dimensional objects, protrusions, swellings, and lumpy substances are reconstructed for Class 5 all the way back to Common Bantu. What seems to be an innovation is the inclusion of terms for containers and hollow spaces in Class 5 as well. Although some of the terms for 3-dimensional containers are reconstructed for Proto-Sabaki by N&H (1993) and are apparently inherited from Common Bantu (ganda 'husk, rind, shell'; bia 'earthenware vessel', jiko 'fireplace, hearth'), a larger number are reconstructed for Proto-Sabaki only, have reconstructions that are dubious, are derived from verbs, or have changed their class affiliation from 9 to 5 (koo 'throat', ziga 'vessel for burning embers', kaka 'empty shell', zizi 'cattle enclosure', gamba 'outer shell'). The same is true of terms for hollow spaces, only one of which (kwapa 'armpit') is reconstructed back to Common Bantu. A second major difference between the semantic structure of Class 5 in Swahili and that described by Denny and Creider (1976) for Proto-Bantu is the inclusion of terms for broad, flat surfaces and things with broad parts. In this case there is more support for the pre-Swahili existence of the subdomain within Class 5 (terms for 'leaf', 'lake', 'hoe', and 'axe' are reconstructed to Common Bantu), but here, too, the domain seems to have been extended in more recent times. Several of the terms have moved to Class 5 from other classes (para 'bald patch' and panga 'machete', both from 11; kafi 'paddle', from 9), some fluctuate between Class 5 and another class (konde 'cultivated field', also 9; kosi 'nape of neck', also 7); others are reconstructed to Proto-Sabaki only (tanga 'sail', kuti 'coconut leaf', paa 'roof'). The most productive semantic category within Class 5 is the category of terms for fruits. As pointed out by most Swahili grammars, there is a regular relationship between Class 5 and Class 3: a noun stem with the prefix of Class 3 designates a plant, and the same stem in Class 5 designates the associated fruit (for example, mpapai (3) 'papaya plant'/papai (5) 'papaya fruit', mmumunye (3) 'gourd plant'/mumunye (5) 'gourd', etc.). Loanwords also follow this pattern (e.g. mlimau (3) 'lemon tree'/ limau (5) 'lemon', from Hindi). The category 'fruit' is productive also in the second sense: it is the basis for several kinds of semantic extension, most notably with respect to shape and size. Fruits are typically 3- dimensional, round, large (in their mature, desirable state), and can be viewed either as solid objects or as containers (skin contains fruit meat as well as seeds). Hence the extension to protrusions, swellings, lumpy substances, 3-dimensional containers, and hollow spaces (by association with containers). These associations-- growth/swelling and containment especially--in turn motivate a more abstract extension, to large things in general. And in fact a noun stem normally associated with any of the other classes, if put in Class 5, acquires the connotation of large size, sometimes negatively evaluated as clumsy or ungainly. The use of Class 5 to derive augmentatives apparently can be traced at least as far back as Proto-Northeast Coast Bantu (N&H 342); elsewhere in the Bantu family there are specialized classes for this purpose (classes 20, 21 and 22), for which there is no evidence in PNEC (N&H 346). As in the case of the use of Class 7 to derive diminutives, this function may be seen as a natural extension from the size associated with prototypical members of the class. It is also worth making a further point in this regard. Some might find it counterintuitive that manufactured objects-- baskets, cooking vessels, and the like, which form the nucleus of Class 7-- would have the connotation of small size whereas fruits and vegetables-- which are, after all, "objectively" smaller than many manufactured objects-- would have the connotation of large size. However, this situation simply reinforces one of the central findings of recent research on linguistic categorization: that human beings classify things in the world linguistically according to their human, culturally mediated perspective, not according to "objective" characteristics of the things themselves. Manufactured objects are "small" in relation to the human body: they can be easily picked up and manipulated. Fruits are "large" in relation to their earlier stages of growth, and it is when they become large that they are of most value to humans. Each of these size connotations is "logical" in its own way, even if the resulting categorization conflicts with size as determined by objective measuring principles. Finally, a word about borrowings. I mentioned earlier that Class 9 acts as the "default" class for loanwords, unless there are semantic or phonological reasons to put these in another class. A look at the loanwords in the Class 5 diagram shows that the meanings of these fit in with the overall semantic patterns of the class. The data from the Helsinki corpus showed only a few cases of recent loanwords consistently receiving Class 5 agreement rather than that of Class 9. For example, much of the vocabulary of sports is derived from English: terms like pointi, raundi (=round), timu (=team), ligi (=league) are used regularly in Swahili newspapers, and are treated as Class 9. I found three exceptions, all soccer terms: goli 'goal' (in soccer), benchi 'bench' (also in soccer), and shuti 'shot', all with agreement of Class 5. These terms differ in their frequency: goli is very common, whereas I only have one example each of the other two, so it is impossible to tell whether their association with Class 5 is a regular pattern or is motivated by their specific contexts of use. Anyway, it is not hard to find semantic motivations for connecting these words with Class 5. For goli, two motivations can be found: the association of Class 5 with large, important things (goals are the point in soccer), and the prior existence in Class 5 of the nearly synonymous inherited word bao. This term denotes a large board used for a special purpose, especially a playing board for the chess-like game metonymically called bao, and by extension is applied to victory in a game. In fact, bao is also used in sports reporting to mean 'goal'. The term benchi may also be associated with bao, since one of the senses of bao is 'bench or table'. As for shuti, in the example where it is used it refers to a successful shot, i.e. one that scores a goal, so it too is connected to importance and victory. In summary, it seems that Class 5 in Swahili has retained its semantic integrity in spite of having lost its prefix in most contexts, and in spite of the potentially destabilizing effects of loanwords. The semantic structure of 5 has probably been influenced by the development of 9 as residual member of the class system, however. If Denny and Creider (1976) are correct in claiming that "non-extended, outline figure" was a feature of Class 9 in Proto-Bantu, then the expansion of 5 into this domain could be motivated not only by the internal structure of 5 (fruits as containers), but also by the loosening of internal coherence in 9. 4.4 Class 11/14 Next, Class 11/14. To view a representative list of nouns in each semantic category, click on that portion of the chart. In Swahili this class is historically derived from a merger of Bantu Class 14 (*bu-) with Class 11 (*lu-). This is a fairly recent development (Nurse and Hinnebusch reconstruct 11 and 14 as distinct classes in Proto-Sabaki). This merger is usually attributed solely to phonological factors (the loss of *l and *b- before -u), and most Swahili scholars describe the resulting class as `semantically confused'. In the words of Nurse and Hinnebusch, `long thin objects (Class 11) and abstracts (Class 14) are hard to reconcile' (N&H 350). However it must be pointed out that even before the time of Proto-Sabaki there was some semantic overlap between Classes 11 and 14 that is not usually recognized. First of all, both classes contained some nouns referring to two-dimensional surfaces or things metonymically related to them, as may be seen from the lists of examples associated with the chart. Thus the word ulili meaning `platform' or `bedstead' is reconstructed for both classes 11 and 14 by Nurse and Hinnebusch; the rest of the reconstructed words in these two categories are evenly divided between Classes 11 and 14. Another area of semantic overlap between the two original classes is that of cohesive substances, which fall somewhere between solid and liquid. If spatial delineation in one or two dimensions was a major characteristic of pre-Sabaki Class 11, and if individuation without countability was a characteristic of Class 14, then the cohesive substances may be thought of as a semantic bridge between the two: they preserve their shape, and have an intimate relationship to surfaces, but are not discrete enough to be countable. It seems likely that the semantic structure of this class has been reorganized to accomodate the merger between the former 11 and 14. In the diagram I suggest the term "essences" to indicate a relationship between the more concrete vegetative substances-- sap and fiber-- and the productive use of Class 11 to derive abstractions (for example, mtoto (1) 'child'/utoto (11) 'childhood'; jamaa (9) 'family, society'/ujamaa (11) 'socialism'). A parallel relationship, though not as productive, exists between names of plants (in Class 3) and their usable sap or fiber (the same stem in Class 11), for example, mgomba (3) 'banana plant'/ugomba 'banana plant fiber'; mlimbolimbo (3) 'thorny hedge plant'/ulimbolimbo (11) 'sap of mlimbolimbo, used for fish poison'; msufi (3) 'kapok plant'/usufi (11) 'kapok fiber'. Other word pairs show analogous relationships: the Class 11 prefix indicates inner substance or constitutive part. For example, mfupa (3) 'bone'/ufupa 'bony substance, cartilage'; ndevu (9) 'beard'/udevu 'single facial hair'; taya (5) 'jaw'/utaya (11) 'jawbone'; mti (3) 'tree'/uti (11) 'stem, trunk of tree; backbone'. The fact that inner substances can sometimes be sticky liquids and sometimes fibers can explain the association of Class 11 with extendedness in two dimensions vs. one. As suggested above, the cohesive qualities of sap may motivate the extension to other cohesive substances as well as to 2-dimensional surfaces, with which cohesive substances have an intimate relationship. Fibers on the other hand are long and thin, which explains the connection to long thin things in general[20]. The categories `things ground up into particles' and plants processed into non-solid substances' are connected to spatially delineated but non-countable substances in that they start out as delineated solids but become non-solid, hence non-countable. 4.5 Interclass oppositions It may be helpful to summarize the relationships among the various classes that have been alluded to in this paper. The following diagram does this schematically: Table 3. INTERCLASS OPPOSITIONS KIND SHAPE SIZE AFFECT 1/2 human - - - 3/4 plants, long, rigid (large) - esp. trees 5 fruits, round, large impressive; solid or ungainly hollow; leaves curved - - & broad 6 aggregates - - - 7/8 artifacts - small cute; insignificant; not-whole (large-- ironic) 11/14 sap spreading flat; - - fiber long flexible essence 9/10 misc. (incl. animal) - - - 5. Conclusion In conclusion, I would like to make two main points about the analysis and methodology described above. First, the principles of cognitive grammar are a useful tool for investigating the semantic structure of noun classes. This approach explicitly recognizes the fact that human beings use linguistic categories to make sense out of the world, and it provides a cognitively motivated framework for describing associative relations among the members of a category. The principles by which different nouns are grouped together into a class are similar to those that govern the connections among the various `senses' of an individual lexical item. These same principles, including metaphor and metonymy, also play an important role in lexical and grammatical change (see also Heine and Claudi 1986). At the same time, analyses along these lines do not attempt to predict the content of a given category or the direction of meaning change. Thus for example one can explain the inclusion of terms for small animals in Class 7, but this does not entail that all terms for small animals must be in this class. Entities in the world may be classified in myriad ways; small size is just one among many possible criteria for classification, and there is no a priori basis for predicting which characteristics speakers will regard as most salient in a given case. What this type of analysis does show is that the groupings that are found are semantically motivated rather than arbitrary. In this respect it is an advance over the point of view that linguistic categories must either be definable in terms of Aristotle's necessary and sufficient conditions for membership, or dismissed as incoherent. The second conclusion to be drawn from this study concerns the use of the database. It must be emphasized that a database is not a discovery procedure for semantic structure. In fact, comparison of the categories in the diagrams in Section 4 with the tags used in the database shows that the tags are only indirectly reflected in the diagrams. Some tags, such as `body part', `animal', turn out to require greater differentiation, in ways not originally anticipated when the database structure was conceived. Others, such as `human collectivities' or `part of substance', were discovered as a result of inspecting larger groups of nouns, the semantic network as a whole, or the intersections between the noun classes and wider semantic domains. The database is an extremely useful tool but like other tools, its limits are the limits of its users.