Friday, January 11, 2013

Back to the Woodshed: Lila on Paul and Norbert on Learning

Lila disagrees with the way that Paul and I have been putting matters concerning learning. This smarts! But there is some small recompense in being beaten up by a real heavy weight. Here's what she has to say.

I think both Paul and Norbert have drawn the lines here too oversimply, and perhaps not even in the right places.   I doubt that the contrast for earliest word learning is merely between “quick” essentially one-trial learning (hence, by implication, recollecting what you previously knew in Plato’s terms) triggered by a suitable (contingent) input, vs a slow incremental procedure that gathers and evaluates evidence delivered partially, across many encounters.  The first problem is that the difference in expressive range between a human (even 3 year old) vocabulary and a moth-wing coloration is so large as to make analogizing between these grotesque.   The 40000 or so common words really all do mean different things (pace Hume, who had a constructionist story, but alas one that doesn’t work).    Second, the relation between what’s acquired and the contingent trigger is nonaccidental for wing colors and bee navigational guidance just as Paul says, whereas  (as I blush even to say in present company) the sound and the meaning of a word are not only arbitrarily related but arbitrarily varying across languages  (Suppose that there were something like “phonetic symbolism” such that the big animals were labeled with multisyllabic words, etc., this might be a better analogy with the wing-color situation; but alas this isn’t so either).   Accordingly, contra Norbert, the learning of even first words is – perhaps paradoxically – slow (in the sense that such knowledge usually appears after, probably, many many encounters  with a word-sound) but quick (in that it appears suddenly, immediately, if we are right, in the presence of the “right” encounter, which may be the first encounter or the 150th encouter).    The problem is the selection of the right encounter.    The usual (associationist) story is really very compelling on a commonsensical basis – it’s probably true that /dog/ occurs more systematically in the presence of dogs than does /aardvark/, etc. , so it seems reasonable that one would be collecting and comparing across successive encounters.   But the problem (even leaving aside our and many other experimental demonstrations that memory won’t support such a procedure) is that the set of concepts and the set of observations are both so large – essentially open – as to make it unlikely in the extreme that a Mind could carry through the collect, contrast, compare procedure required to extract the needed generalization (/dog/ refers to a dog qua dog).   Some progress may come from noting the temporal parameters within which observation of a dog, utterance of /dog/, and mutual attention to dogs (e.g., joint visual fixation) co-occur, something we’re working on (maybe this will deserve the title “representational causation”).    But I think it’s premature to be happy at all about where we are in thinking about  POS and word learning.    

Because the next problem, really it is the real first problem, is that most words cannot be acquired at all if the procedure is limited to mapping between a sound and “the world.”   For many many words, the world plays almost no role at all in constraining the potential meaning (try “think” or “probably” or “similar” or “fair”) and for most it plays a subsidiary role, with linguistic information providing the bulk of the information (information from “the world” is generally brushed off and ignored both by adults and 3 year olds when pitted against linguistic information, amazing!).    How soon we forget:  Because there are some suggestions now that learning is on single trials (plus a confirmation trial) one tends to forget a longer line of evidence that most words are learned via a domain-specific machinery that examines their linguistic licensing conditions. The tiny set of words that are acquired absent linguistic information is however crucial because (1) they are the very first words, and for necessary reasons; (2) they provide the enabling information for building the language-specifics of structured representations within which all subsequent meanings are learned (specifically, you have to find out where the subject of the clause is, in your language, and you do this, basically, by seeing where  in sentences “dog” shows up, within dog scenarios); (3) speaking systematically, all theoretical questions are begged unless there is a NONlinguistic way of learning some ‘seed words’ (otherwise you are trapped saying you learned the syntax from the meanings and you learned the meanings from the syntax); the procedure has to be grounded by a first, domain-general, procedure for confronting the world; (4)  this procedure is exceedingly limited, I believe, but attainable even by very lowly animals taught stupid tricks in the laboratory by psychologists.   Call this learning.    Because it is an “outside in” procedure designed by nature to work no matter how disorganized and unsystematic your present circumstances   Not “acquisition,”  applicable to the “inside out” procedures to follow.  You need  a learning procedure, linked to the world, to get into that system, which though it is UG at bottom, is well disguised at the surface (e.g., as English, vs Urdu, vs ASL…).   Your mother just says blah-blah-blah instead of NP VP, there’s the rub.


  1. Lila implies that all the "interesting" action in word learning arises when we start considering syntactic bootstrapping. That's what I believe as well based on her work. Where we seem to disagree (and believe me this is NO FUN FOR ME!!!) is whether the first steps, the initial words used to prime the linguistic system, are properly characterized as 'learned.' Following her lead in the cited papers with colleagues, I concluded that this was not learning for it failed to have the signature properties I associated with learning. A fair retort is that 'learning' is not a technical term and that there can be many flavors. This has been a topic pursued in some of Paul's posts with Alex Drummond and Alex Clark making useful clarifications. Right now, my tentative view is that we should still drop 'learning' as a term for it confuses matters too much, for roughly the reasons that Lila notes above. There is a common sense version and it is the associationist one. This one seems wrong as Lila has argued BOTH for initial word learning and later word learning. If so, the common sense use of the term is inapposite as it is misleading. We can, of course try to rehabilitate the common sense word, but I think that this is hopeless. Better to dump it and start inserting the technical notions. As 'Acquisition' is a nice anodyne word it can serve as suffix for various kinds; data driven-acquisition, one trial acquisition, guess and stick acquisition etc. At least we won't then all get misled by common misconceptions that can, unfortunately, encumber further progress.

    One last point: Lila makes an important point about priming the linguistic system. We cannot assume that it primes itself. Chomsky ahs a useful terms for this 'epistemological priority.' At any rate, I agree, that whatever is going on initially is required for the language system to kick in. It is not itself supported by FL. Of course, once primed, FL kicks in with a vengeance and then very fast hand over fist vocab acquisition can begin in earnest.

  2. Just to clarify...I wasn't suggesting that lexical items are related to aspects of the environment in the way that wing color is related to temperature-during-the-relevant-period-of-development in certain butterflies. I was suggesting that contingent features of the environment can have a striking (and fitness-enhancing) effect on phenotype, even in cases that are not helpfully described as learning. (That's worth bearing in mind, I think, in discussions of learning and parameter setting.) But I agree entirely that in order to acquire a lexicon that matches the one their parents are using, kids will need to do some kind of triangulation on some seed words--and in that sense, kids need to exploit some kind of outside-in procedure, in order to use their linguistic capacities to acquire a lexicon that will support communication. I agree with that, and with just about everything else Lila says. But I do think it's worth leaving room for the possibility that humans can acquire I-languages that don't support communication (say because the phonology is null).

    Is the requisite outside-in procedure helpfully described as learning? I don't know. When I was an undergrad in Psych 101, they told me what learning was, and I had a hard time believing that it accounted for interesting psychological phenomena. Now my friends in psychology tell me that learning does account for interesting psychological phenomena, and we just need to figure out what learning is. I honestly do believe that's progress. But until someone can give me a handle on how to understand 'learning' in the context of theories, I share Norbert's reluctance to put weight on the commonsense notion, which has proven remarkably slippery (ever since Plato). I can't do better than Randy Gallistel's notion of using information from the environment to set the value of a variable in some algorithm. And I'm willing to run with that characterization of learning for the time being, suppressing worries about how to understand 'variable' and about whether the butterflies learned the temperature. So if the requisite outside-in procedure uses information from the environment to set the value of a relevant variable, then I'll say that the requisite outside-in procedure counts as (Gallistel) learning.

  3. I like "acquisition" or even "growth" as Chomsky often notes, as this both allows for clarification about the requisite input, the speed of change, and the range of both possible and impossible outcomes. This is useful because it allows for comparisons across species, which I think is one important aspect of the connection Paul was making between language and wing color. In the arena of imitation, there were long and heated debates about what folks meant by imitation. Ultimately, this area developed a taxonomy of terms that allowed for very clear inquiry into the underlying mechanisms. Thus, we have, for example, a variety fo social learning processes, but they are different: observational learning, social facilitation, imitation, goal emulation, and teaching. They are all forms in which there is some necessary social situation that is required for acquisition.