Saturday, June 21, 2014

Baker's Paradox IV: Transformation and Variation

How does the learner acquire the following patterns in dative constructions:

(1) a.     John told a story to Bill.
         John told Bill a story.
b.    John promised a car to Bill.
         John promised Bill a car.
c.    John donated a painting to the museum/them.
        *John donated the museum/them a painting.

Lexical conservation is not the way to go. Children productively (over)generalize both constructions (“I said him no”) about 5% of time (Gropen et al. 1989 Lg.) at a rate comparable to that of past tense overreguarlization.  As young as age 3, they can extend novel verbs from one construction to another (“I pilked the cup to Petey”=>”I pilked Petey the cup”; Conwell & Demuth 2007 Cognition) though the DOC to PC extension is more robust than the other way around. 

There is pretty good agreement on the semantic conditions for the dative constructions: DOC generally involves caused possession of the theme by the goal and PC requires caused motion of the theme along the path to the goal. These are what Pinker (1989) calls “broad range rules” but they are clearly necessary conditions on the dative constructions as the examples in (1) illustrate. Moreover, there is considerably crosslinguistic variation: in some languages, (the equivalent of) dative constructions are limited to a handful of verbs. 

Pinker then propose a set of “narrow range rules”, each defining a subclass of verbs on the basis of semantics, e.g., verbs of instantaneous causation of ballistic motion (“throw”), verbs of future having (“leave”), verbs of instrument of communication (“telegraph”), etc., which allow DOC and verbs of fulling (“present”), verbs of manner of speaking (“shout”) etc., which allow PC only. Beth Levin refined these lists in her 1993 EVCA book. But as noted by Melissa Bowerman and others, these subclasses do not solve the learning problem. First, it’s not clear how the child learner can conjure up these subclasses: we probably don’t want to build the telecommunication class into an innate UG. Second, these subclasses do not behave consistently across languages (Levin 2008 Stanford ms.); even if the they are available for the learner’s consideration, their productivity still needs to be determined.

You know where we are going with this. I looked at a 3 million word corpus of child directed English and found a total of 49 verbs attested in either dative constructions:

(2) a. 48 appear in PC, of which 37 also appear in DOC. 
b. 38 appear in DOC, of which 37 also appear in PC. 

Applying the N/ln N formula, we see that both PC=>DOC and DOC=>PC are productive generalizations. That is, if the child sees a verb used in one of the constructions, it will automatically generalize to the other. This appears to be what children do; see above. The DOC=>PC rule is a far more reliable generalization, virtually exceptionless,  than the PC=>DOC rule, which may account for the asymmetry in the extension of novel verbs in Conwell and Demuth’s study. 

So there is no Baker’s paradox for a 3 year old, as both construction can be productively learned. The paradox arises for certain verbs such as the Latinate class but there is hardly any Latinate dative verbs in the child directed data (and no a single instance of the telecommunication verbs; these are data collected before everyone was online). As the child grows older, especially after the onset of literacy which will begin to feature more Latinate words, his vocabulary will expand and he will encounter more examples of dative constructions: some verbs will appear in both DOC and PC while others will only appear in PC. But even the ungrammaticality of latinate verbs in DOC's is matter of tendency not to mention individual variation. Those such as “assign”, “advance”, “award” “guarantee” etc. do allow DOC and Germanic verbs such as “shout”, “trust”, “lift”, “pick” do not. Collectively, Gropen et al’s list contains 54 Latinate verbs that can participate in PC but only 14 can be used in DOC: Latinate verbs, then, do not productively participate in DOC and the learner will have to lexicalize the 14. Levin’s longer list shows the same pattern.

So the child grows into a paradox: in other words, the productivity of rules/constructions must change over the course of language acquisition. Gropen et al. (1989) lists of 73 DOC/PC verbs and 34 PC only verbs for a total of 107, which yields a threshold of 23. If the child learns all of the 107 verbs, the PC=>DOC extension will no longer be justified. A productive rule when he was three will cease to be productive when he’s 30.  

I think this is when the child will be prompted to look for subclasses or narrow range rules. Not having a productive linguistic system is a crime against nature. Sometimes we are genuinely stuck when there isn’t any to be found (such as the paradigmatic gap examples I mentioned in the previous post) but the child will not give up trying. In a paper published in the same volume as Berwick, Chomsky and Piatelli-Parlmarini, Julie Legate and I studied how the metrical stress parameters of English can be acquired. It’s well known that the overwhelming majority of English words are stress initial (up to 80-90%; Cutler & Carter 1987, Comp. Speech & Lg.), but no metrical theory of English, or any English speaker, treats English as a quantity insensitive (QI) system like Afrikaans while lexically listing 10% of exceptions. Using child directed English words, we found that indeed, the QI system fails to reach productivity despite being the overwhelming majority, and a productive system (as described in Halle 1998 LI) can only be established if the child subdivides the vocabulary into nouns and verbs and consider different stress marking options for these subclasses. Conceivably, this is how they learn the narrow range rules. OK, PC=>DOC may be bust, but if I cut up the verbs into semantic classes, I can still find some productive ones. 

This work has tormented me for quite some time. I have argued for a variational conception of language learning, where the learner acquires a probabilistic distribution over grammatical hypotheses—which is contrasted that with what can be called “transformational” model of learning, where the learner goes from one grammar to another. Yet what we have on hand is exactly a transformational model of language a la hypothesis testing (see Aspects), where the hypotheses are confirmed or rejected by an evaluation metric for productivity. 

There really does seem to be two kinds of learning in child language. On the one hand, there is probabilistic adjustment, where non-target grammars show up. The case for parameters remains strong; I hope to provide a report on some recent collaborative work soon. On the other, we have the tipping point phenomena such as U-shape curve learning and other forms of linguistic induction, where a hypothesis suddenly emerges. 

I’m happy to concede that I’m treating unattested examples as negative evidence. As noted earlier, the child must be able to generalize over unseen data so that much seems unavoidable.  But I still think this work is different from at least the conventional use of indirect negative evidence. Under the standard view, the learner has two (or many) hypotheses and performs some kind of comparison, discrete or probabilistic, to select the best. (For a recent take on the dative acquisition, see Perfors et al. JCL 2010 and Villavicencio et al. ACL 2013.) The model developed here considers one hypothesis at a time by working over two numbers: it keeps a hypothesis that is good enough and moves on to find another if not. This is the classic error driven learning in much of the inductive learning business (Aspects, Wexler & Culicover, Berwick).

In any case, I think the empirical aspects of productivity are far more important than theoretical formulations and deserve much more attention: 

  • A productive system requires super duper majority: see English metrical stress. 
  • Productivity can change over the course of language acquisition.
  • The failure of productivity results in ineffability such as paradigmatic gaps: sometimes the best isn't good enough. 

Off to hiking in Yunnan, with some Peking ducks along the way. 

No comments:

Post a Comment