语言教学研究·网络环境下英语规则化趋势研究（语言服务书系）最新章节_张权著

2.1　The Great Past Tense Debate

Over 20 years, the English past tense has been the subject of a debate on the nature of language processing. The debate began with the report of a connectionist model by Rumelhart and McClelland（2002）and a critique by Pinker and Prince（1988）, and has since been the subject of many papers, conferences and simulation models. The past tense is of theoretical interest because it embraces two strikingly different phenomena. Regular inflection, as in walk-walked and play-played , applies predictably to thousands of verbs and is productively generalized to neologisms such as spam-spammed and mosh-moshed , even by preschool children（Kim, et al., 1992）. Irregular inflection, as in come-came and feel-felt , applies in unpredictable ways to some 180 verbs, and is seldom generalized; rather, the regular suffix is often overgeneralized by children to these irregular forms, as in maked and doed . A simple explanation is that irregular forms must be stored in memory, whereas regular forms can be generated by a rule that suffixes -ed to the stem（Marcus, et al., 1995）. Rumelhart and McClelland（1986）challenged that explanation with a pattern-associator model（RMM）that learned to associate phonological features of the stem with phonological features of the past-tense form. It thereby acquired several hundred regular and irregular forms and overgeneralized -ed to some of the irregulars.

2.1.1　Single Mechanism Storage Theory（Connectionism）

Learning the English past tense has been one of the central topics of debate in cognitive science since Rumelhart and McClelland published their original neural network model in 1986. The phenomenon is very simple. English verbs can be broken down into two categories: regular and irregular verbs. The past tense of a regular verbs can be obtained by simply adding -ed to the stem. Irregular verbs on the other hand are unsystematic: each verb has a unique inflection. When children have to learn the inflection of the past tense, they go through three stages. In the first stage, their use of the past tense is infrequent, but when they use the past tense they do so correctly. In the second stage, they use the past tense more often, but they start overregularizing the irregular verbs. So instead of saying broke , they now say * breaked . On the other hand, inflection of regular verbs increases dramatically, indicating that the child has somehow learned the general regular pattern. In the third stage, they inflect irregular verbs correctly again. This pattern of learning is often referred to as U-shaped learning.

Although learning the past tense seems to be rather simple, it encompasses a number of issues in language acquisition and learning in general. Apparently, the past tense has two aspects: on the one hand, there is a general rule, and on the other hand, there is set of exceptions. Children are able to learn both aspects, and the phenomenon of U-shaped learning seems to implicate that children learn the general rule in stage 2. The important point Rumelhart and McClelland make is that this not necessarily implies that this knowledge is actually represented as a rule in the cognitive system: their neural network model has no separate store for rules, but it exhibits rule-like behavior in the form of U-shaped learning.

This neural network model, also known as single mechanism storage theory is quite different from the Words and Rules theory proposed by Pinker in 1991. The former claims that an integrated network maps from the stems of all verbs to their past-tense forms, using a single network of units and connections. For example, in the Rumelhart and McClelland model（Rumelhart & McClelland, 1986）, which is a typical connectionist mode, the same units and connections that produce regular past tenses from regular stems also process the irregulars, so the network has an inherent tendency to do the same thing to the exceptions that it does to regulars, namely, copy the features of the stem to the past-tense form and add /d/, /t/ or /Id/ depending on the final consonant. To produce kept instead of keeped （note that both end with unvoiced /t/）all that is required is to adjust the activations of the output units representing the vowel, something that the network will have learned to do on the basis of experience with keep and its neighbors creep , leap , sleep , sweep and weep . The network uses the same connection-based knowledge that allows it to perform the regular mapping, and also taps into specific connections activated by the particular properties of keep to produce the vowel adjustment.

The Rumelhart and McClelland model of past-tense inflection consists of a simple pattern associator network that learns the relationship between the phonological forms of the stems and past-tenses of English words. This Single Pattern Associator（SPA）theory（Rumelhart & McClelland, 1986）holds that both regular and irregular forms are generated in a pattern associator network in which weighted connections associate phonological and semantic features of stems with phonological and semantic features of their past-tense forms. All processing is accounted for using weighted phonological units（e. g., -ing to -ung for sing , -k to -kt for walk ）that are strengthened with exposure and shared across phonologically similar stems, resulting in automatic generalization by similarity. This model contains no lexical entries or grammatical representations.

This network is flanked by a fixed encoding network on the input side and a fixed decoding network on the output side（see Figure. 2.1）. All learning occurs in the pattern associator. The encoding network simply converts a string of phonemes into the “Wickelfeature” representation used inside the network to represent the stem of each word. Similarly, the decoding network converts the computed Wickelfeature representation of the attempted past-tense response back to a sequence of phonemes. The overall theory within which this model arose asserts that processing is meaningful and context-sensitive; for simplicity, such influences were not included in the model.

Figure 2.1　The basic structure of the Rumelhart and McClelland model（Rumelhart & McClelland, 1986）

In the Rumelhart and McClelland model, the pattern associator contains two pools of units. One pool, called the input pool, is used to represent the input pattern corresponding to the root form of the verb to be learned. The other pool, called the output pool, is used to represent the output pattern generated by the model as its current guess as to the past tense corresponding to the root form represented in the inputs. Each unit stands for a particular feature of the input or output string.

For connections, the pattern associator contains a modifiable connection linking each input unit to each output unit. Initially, these connections are all set to zero so that there is no influence of the input units on the output units（see Figure 2.2）. Learning involves modification of the strengths of these interconnections.

Figure 2.2　Two pools of units in Pattern Associator（Bruening & Hermon, 1998）

In processing, for a given input, the pattern associator produces an output by a simple neuron-like activation process. Each output unit computes a “net input” based on the current input pattern and the values of the connection weights. The net input is the sum, over all of the incoming connections, of the activation of the sending unit multiplied by the weight of the connection. Each unit also has a modifiable threshold. When the net input exceeds the threshold, the unit tends to be turned on, with a probability approaching as net input increases; otherwise, the unit tends to be turned off.

In learning, the network is trained using Rosenblatt's perception convergence procedure. On a learning trial, the model is presented with the stem form of a word and its correct past tense. The stem form is encoded, and the activations of the Wickelfeature output units are computed. This computed representation is compared with the correct representation of the word's past tense. If the computed activation of a given unit matches the correct value, no learning occurs. If a unit that should be active is not, the weights to that unit from each active input unit receive a small fixed increment, and the threshold is reduced. Correspondingly, if a unit that should not be active is on, the weights from each active input unit are decremented and the threshold is increased. As a result, the network gradually improves performance over many learning trials, simulating a gradual developmental process（Rumelhart & McClelland, 1986）.

The single mechanism storage theory, namely connectionism states that initially the neural network is able to accommodate all individual examples, but as the size of the vocabulary increases, the network is forced to generalize, producing decreased performance on irregular verbs. However, this reliance on vocabulary growth and constitution is also the Achilles heel of the model: the input of the model has to be carefully controlled in order to achieve the desired performance. For example, Rumelhart and McClelland increased their vocabulary from 10 to 420 words just before the onset of the decrease in performance on irregular verbs. More modern neural network models have more moderate schemes of increasing vocabulary, but almost always with some growth spurt. Another problem is the constitution of the vocabulary: despite the fact that regular verbs far outnumber irregular verbs, the actual use of irregular verbs is around 70%（the so-called token-frequency）. If this raw input of 70% irregular verbs is presented to a neural network, it cannot discover the regularity. With respect to errors, neural network models tend to produce many different types: overregularizations, irregularizations, blends, and other unaccountable errors. A final problem, notably for the more advanced three-layer backpropagation networks, is that they require feedback on their own production, despite the well-known fact that parents do not consistently give feedback on syntactic errors. The pattern that emerges from the problems that neural network models have is that they are underconstrained with respect to generalization. Their learning characteristics are too much determined by the input, producing a mismatch between learning in networks and actual human learning（Taatgen & Dijkstra, 2003）.

2.1.2　Dual Mechanism Theory Words-and-Rules Theory

2.1.2.1　Words-and-Rules（WR）Theory

The Words-and-Rules（WR）theory, proposed by Stephen Pinker first in 1991, claims that the regular - irregular distinction is an epiphenomenon of the design of the human language faculty, in particular, the distinction between lexicon and grammar made in most traditional theories of language. The lexicon is a subdivision of memory containing（among other things）the thousands of arbitrary sound—meaning pairings that underlie the morphemes and simple words of a language. The grammar is a system of productive, combinatorial operations that assemble morphemes and simple words into complex words, phrases and sentences. Therefore, irregular forms are stored in memory（which associate words by their sound, meaning and spelling, allowing limited generalization by analogy to similar forms）, whereas regular inflection is achieved by a rule（e. g., N plural =N+-s, V past =V stem +-d）, applying by default upon the failure to retrieve a stored irregular（or one highly similar to it）from memory. In other words, irregular forms are just words, acquired and stored like other words, but with a grammatical feature like “past tense” incorporated into their lexical entries. Regular forms, by contrast, can be productively generated by a rule, just like phrases and sentences. A stored inflected form of a verb blocks the application of the rule to that verb（e. g. brought pre-empts bringed ）. Elsewhere（by default）, the rule applies: it concatenates -ed with the symbol “V”, and thus can inflect any word categorized as a verb（see Figure 2.3）.

Figure 2.3　Simplified illustration of the Words-and-Rules（WR）theory and the Declarative/Procedural（DP）hypothesis（Pinker & Ullman, 2002）

As figure 2. 3 demonstrated, when a word must be inflected, the lexicon and grammar are accessed in parallel. If an inflected form for a verb（V）exists in memory, as with irregulars（e. g., held ）, it will be retrieved; a signal indicating a match blocks the operation of the grammatical suffixation process via an inhibitory link from lexicon to grammar, preventing the generation of holded . If no inflected form is matched, the grammatical processor concatenates the appropriate suffix with the stem, generating a regular form.

Irregular forms, then, do not require an “exception module”. They arise because the two subsystems overlap in their expressive power: a given combination of features can be expressed by words or rules. Thus either a word（irregular）or a rule-product（regular）can satisfy the demand of a syntactic or semantic representation that a feature such as past tense be overtly expressed. Diachronically, an irregular is born when（for various reasons）learners memorize a complex word outright, rather than parsing it into a stem and an affix that codes the feature autonomously.

2.1.2.2　The Declarative/Procedural（DP）Hypothesis

The Declarative/Procedural（DP）hypothesis the WR theory recently extends to is a hypothesis about the neurocognitive substrate of lexicon and grammar. According to the Declarative/Procedural（DP）hypothesis（Marcus, et al., 1992）, lexical memory is a subdivision of declarative memory, which stores facts, events and arbitrary relations（Coslett, 1986）. The consolidation of new declarative memories requires medial-temporal lobe structures, in particular the hippocampus. Long-term retention depends largely on neocortex, especially temporal and temporo-parietal regions; other structures are important for actively retrieving and searching for these memories. Grammatical processing, by contrast, depends on the procedural system, which underlies the learning and control of motor and cognitive skills, particularly those involving sequences（Coslett, 1988）. It is subserved by the basal ganglia, and by the frontal cortex to which they project—in the case of language, particularly Broca's area and neighboring anterior cortical regions. Irregular forms must be stored in the lexical portion of declarative memory; regular past-tense forms can be computed in the grammatical portion of the procedural system.

2.1.2.3　Weak Memory Entry

The key predictions of WR are: ① that irregulars should have the psychological, linguistic and neuropsychological signatures of lexical memory, whereas regulars will often have the signatures of grammatical processing; and ② that speakers should apply regular inflection whenever memory fails to supply a form for that category. A stored form may be unavailable for many reasons: low or zero frequency, lack of a similar form that could inspire an analogy, inaccessibility because of a word's exocentric structure, novelty of the form in childhood, and various kinds of damage to the neurological substrate of lexical memory.

According to the above predictions, if a word is rare, its entry in the mental lexicon is weaker. In such cases, the irregular inflection will suffer but the regular inflection will not. Consequently, irregularity of past indefinite and past participle forming can be put down to the fact that especially due to the high frequency of occurrence they remain unchanged from the ancient times till nowadays. Whereas, the lower frequency of occurrence the word has, the more probable is its loss from the lexicon. This notion can be further confirmed by the fact that old English has twice as many irregular forms as Modern English. For instance, some obsolete forms: cleave-clove , crow-crew , abide-abode . Slowly over time the number of irregular verbs is decreasing. The force of analogy tends to reduce the number of irregular verbs over time. This fact explains the reason that irregular verbs tend to be the most commonly used ones; verbs that are more rarely heard are more likely to switch to be regular. Furthermore, Francis and Kucera closely studied the frequency in a million-word corpus, concluding that the ten most spoken verbs in English are irregular whereas the first ten least spoken verbs are regular. Such fact further confirms the predictions of the Words and Rules（WR）theory.

Table 2.1　Frequency in a million-word corpus（Francis & Kucera, 1982）

Table 2.1 also implies that irregular forms have to be memorized to survive in a language. If an irregular verb declines in popularity then the children will fail to remember its past tense and it will eventually become regular.

2.1.2.4　Neuropsychological Dissociations

The WR theory can be further confirmed by evidence from neuropsychological perspectives. According to the WR theory and DP hypothesis, damage to the neural substrate for lexical memory should cause a greater impairment of irregular forms（and any regular forms that are dependent on memory storage）, and a diminution of the tendency to analogize novel irregular-sounding forms according to stored patterns（as in spling-splung ）. In comparison, damage to the substrate for grammatical combination should cause a greater impairment of the use of the rule in regular forms, and of its generalization to novel forms.

Anomia is an impairment in word finding often associated with damage to left temporal/temporoparietal regions [see Figure 2.4（a）]. Patients often produce fluent and largely grammatical speech, suggesting that the lexicon is more impaired than grammatical combination（Pinker & Ullman, 2002）. In elicited past-tense production tasks, patients（compared with controls）do worse with irregular than with regular verbs [see Figure 2.4（b）], produce regularization errors like swimmed （which occur when no memorized form comes to mind and the rule applies as the default）, rarely analogize irregular patterns to novel words（e. g., spling-splung ）, and are relatively unimpaired at generating novel regular forms like plammed （Pinker & Ullman, 2002）. Agrammatism, by contrast, is an impairment in producing fluent grammatical sequences, and is associated with damage to anterior perisylvian regions of the left hemisphere（Pinker & Ullman, 2002）. As predicted, agrammatic patients show the opposite pattern: more trouble inflecting regular than irregular verbs, a lack of errors like swimmed , and great difficulty suffixing novel words. Similar effects have been documented in reading aloud, writing to dictation, repeating and judging words（even when controlling for frequency and length）, and in a regular/irregular contrast with Japanese-speaking patients（Pinker & Ullman, 2002）.

Figure 2.4　Dissociating regular and irregular processing in aphasia（Pinker & Ullman, 2002）

In Figure 2.4（a）shows the approximate lesion sites of patient FCL（red area, left anterior perisylvian regions）, who had symptoms of agrammatism, and patient JLU（green area, left temporo-parietal region）, who had symptoms of anomia. In（b）, results of verb inflection tests showed that the agrammatic patient had more trouble inflecting regular verbs（lighter bars）than irregular verbs（darker bars）, whereas the anomic patient had more trouble inflecting irregular verbs—and overapplied the regular suffix to many of the irregulars（light green bar on top of dark green bar）. The performance of age-and education-matched control subjects is shown in the grey bars.

Therefore, converging findings come from neuropsychological methodologies. In normal subjects, both regular and irregular inflected forms can prime their stems. By hypothesis, a regular form is parsed into affix and stem（which primes itself）; an irregular form is associated with its stem, somewhat like semantic priming. Patients with left inferior frontal damage do not show regular priming（walked-walk）, although they retain irregular priming（found-find）and semantic priming（swan-goose）. A patient with temporal-lobe damage showed the opposite pattern（Pinker & Ullman, 2002）. This suggests that the brain processes regular forms like syntactic combinations and irregular forms like words, thus proving the feasibility of WR theory.

2.1.2.5　Homophonous Verbs

The single mechanism storage theory, namely connectionism proposed by Rumelhart and McClelland holds that both kinds of past tense forms are generated by weighted connections in a connectionist pattern associator. So all processing is accounted for using weighted phonological units（e. g., -ing to -ung for sing , -k to -kt for walk ）that are strengthened with exposure and shared across phonologically similar stems, resulting in automatic generalization by similarity. This model contains no lexical entries or grammatical representations. By contrast, The WR account asserts that the distinction between regular and irregular verbs reflects the two ways language is represented and processed in the mind. Irregular past tense forms are stored in the lexicon, a subdivision of associative memory, and as a result, demonstrate strong effects of word frequency and phonological similarity. Regular past tense forms, in general, are relatively insensitive to these variables because they may be assembled by a productive suffixing rule, which in this case adds -ed to the stem. The rule applies when memory fails to retrieve an irregular form, such as in the case of novel or low-frequency verbs. These rules belong to a grammatical system responsible for the construction of complex words and sentences.

Empirically, these two theories make different predictions in the case of homophonous verbs（e. g., rang the bell versus ringed the city, broke the vase versus braked the car）. Since phonological input units remain identical, these cases are problematic for an SPA model that incorporates only phonological features（e. g., Rumelhart & McClelland, 1986）, because the two items with identical input representations must be systematically mapped onto distinct output representations. In WR in which words have representations apart from their sounds, homophones with distinct past tense forms are unproblematic because the irregular past tense form is associated with a word and not simply a set of sounds. Moreover, novel verbs that are homophonous with irregular forms can receive a regular form as well whenever they are derived from a noun（e. g., ringed the city）or adjective（e. g., righted the boat）, because every irregular verb form is stored with a verb root, not with a set of verb sounds, and a verb based on a noun is not represented as having the same root as its homophonous pure verb（Pinker & Prince, 1988; Marcus et al., 1995）.

Previous experimental evidence indicated that grammatical structure, rather than sheer semantic similarity, determines subjects' judgments of past-tense forms. Kim et al（1992）presented existing and novel verbs that are homophonous with irregulars and found that verbs derived from nouns（e. g., to shed the tractor = “put in the shed”）were judged as requiring regular past-tense forms（ shedded the tractor ）whereas verbs that were merely metaphorically extended from their central sense did not（ to shed the tractor = “get rid of possessions”）. Although denominal verbs also happen to differ semantically from their irregular homophones, a regression analysis showed that only denominal status, not semantic similarity, predicted the degree of preference for regular or irregular forms.

To fully explore the interaction between phonology and semantics in inflectional morphology, it is necessary to vary them independently. Since connectionists hold that semantics itself plays a major role in the generalization of past tense, in order to resolve whether verb meaning plays a direct role in inflection, we must also examine the effect of semantic similarity in cases where there is low and moderate phonological similarity between novel verbs and the existing verbs to which they are similar. By expanding comparisons to cases in which both phonological and semantic similarities are manipulated, one can see whether semantic similarity elicits a generalization gradient analogous to the generalization gradient already known to exist for phonological similarity（e. g., Bybee & Moder, 1983; Prasada & Pinker, 1993）.

Figure 2.5　Predicted pattern for WR theory（Huang & Pinker, 2005）

The Words-and-Rules theory predicts that when people are asked to generate past tense forms of novel verbs that vary in similarity to existing verbs, semantic similarities should have limited consequence on generalization of irregular past tense patterns（e. g., -ing, -ung）in cases of low and moderate phonological similarity, and only lead to greater generalization in the case of high phonological similarity, where the combination of phonological and semantic similarity evokes a particular existing verb（see Figure 2.5）. Conversely, the Single Pattern Associator Theory predicts that increases in semantic similarity would lead to greater generalization of an irregular past tense across all levels of phonological similarity.

2.1 The Great Past Tense Debate

2.1.1 Single Mechanism Storage Theory（Connectionism）

2.1.2 Dual Mechanism Theory Words-and-Rules Theory

2.1.2.1 Words-and-Rules（WR）Theory

2.1.2.2 The Declarative/Procedural（DP）Hypothesis

2.1.2.3 Weak Memory Entry

2.1.2.4 Neuropsychological Dissociations

2.1.2.5 Homophonous Verbs