Faculty of Language: There's No There There

Tuesday, February 5, 2013

There's No There There

I grew up in a philosophy department and so I always feel a warm glow of nostalgia when a book by an eminent practitioner of this dark art turns his/her attention to my tiny areas of interest. Recently, Jesse Prinz, Distinguished Professor in Philosophy at CUNY, has directed his attention to the shortcomings of rationalist efforts in the mental sciences in an effort to resuscitate empiricist conceptions of mind. The book, Beyond Human Nature, is very much worth looking at (but for god’s sake don’t buy it!) for it is a good primer on just how little empiricist conceptions, despite the efforts of its mightiest minds, have to offer those seriously interested in cognition. I’m not talking modest contributions, I’m talking NADA! Before proceeding, some warnings. I am a severe thaasophobe, with an attention span that requires quick capture. Banalities and weasel wording can induce immediate narcoleptic seizure. Prinz held me to the end of chapter 6 before Morpheus bared further progress. I never fight with gods. Consequently, remarks here are limited to the first six chapters and of these they concentrate mainly on 6, this being dedicated to what I know best, the Chomsky program in generative grammar. With this caveat, let’s push forward, though caveat lector, this post is way too long.

My main objection with the book is that it refuses to play by the rules of the game. I have discussed this before (here), but it is worth reviewing what is required to be taken seriously. Here are the ground rules.

First, we have made many empirical discoveries over the years and the aim must be to explain these facts. In linguistics these include Island effects, fixed subject effects, binding theory effects etc. I know I have repeated myself endlessly about this, but it seems that no matter how often we emphasize this, critics refuse to address these matters. Prinz is no exception, as we shall see.

Second, if one is interested in not merely debunking generative grammar but the whole rationalist enterprise in cognition then attention must be paid to the results there, and there have been many. We have reviewed some by Gleitman and Spelke (here, here) but there are many more (e.g. by Baillargeon on causality and Wynn on numbers a.o.). Prinz touches on these but is coy about offering counter analyses. Rather he is satisfied with methodological reflections on the difficulties this kind of work must deal with and dismisses 40 years of research and literally hundreds of detailed proposals by pointing out the obvious, viz. that experiments must be done carefully and that this is hard. Not exactly big news.

Third, though acknowledging these data points is a necessary first step, more is required. In addition one must propose alternative mechanisms that derive the relevant facts. It is not enough to express hopes, desires, expectations, wishes etc. We need concrete proposals that aim to explain the phenomena. Absent this, one has contributed nothing to the discussion and has no right to be taken seriously.

That’s it. These are the rules of the game. All are welcome to play. So what does Prinz do. He adopts a simple argumentative form, which can be summarized as follows.

1. He accepts that there are biological bases for cognition but that they vastly underdetermine human mental capacities. He dubs his position “nurturism” (just what we need another neologism) and he contrasts with “naturism.”[1]

2. His main claim is that cognition has a heavy cultural/environmental component and that rationalism assumes that “all brains function in the same way…” and “our behavior is mostly driven by biology” (102).

3. He reviews some of the empirical arguments for rationalism and concludes that they are not apodictic, i.e. it is logically possible that they are inconclusive.

4. He buttresses point 3 by citing work purporting to show methodological problems with rationalist proposals. Together 3 and 4 allow Prinz to conclude that matters are unsettled, i.e. to declare a draw.

5. Given the draw, the prize goes to the “simpler” theory. Prinz declares that less nativism is always methodologically preferable to more and so given the empirical standoff, the laurel goes to the empiricists.

That’s the argument. Note what’s missing: no counter proposals about relevant mechanisms. In short, Prinz is violating the rules of the game, a no-no. Nonetheless, let’s look a bit more into his argument.

First, Prinz allows that there is some innate structure to minds (e.g. see around 152).[2] The question is not whether there is native structure, but how much and what kind. For Prinz, associationist machinery (i.e. anything coming in through the senses with any kind of statistical massaging) is permissible. Domain specific modular knowledge with no simple perceptual correlates is not (c.f. 171).

This is standard associationsim at its grubbiest. So despite his insistence about how the truth must lie at some point between the naturist and nurturist extremes, Prinz erects his standard on pretty conventional empiricist ground. No modularity for him. It’s general learning procedures or nothing.

Why does Prinz insist on this rather naïve version of empiricism? He wants to allow for cultural factors to affect human mental life. For some reason. he seems to think that this is inconsistent with rationalist conceptions of the mind. Why is beyond me. Even if the overall structure of minds/brains is the same across species, this does not prevent modulation by all sorts of environmental and cultural factors. After all, humans have four chamber hearts as a matter of biology but how good an individual heart is for marathons is surely heavily affected by cultural/environmental factors (e.g. training regimens, diet, altitude, blood doping etc.).

So too with cognition. Indeed, within linguistics, this has been recognized as a boundary condition on reasonable theorizing since the earliest days of generative grammar. The standard view is that UG provides design specifications for particular Gs, and particular Gs can be very different from one another. In a standard P&P theories the differences are related to varying kinds of parameter settings, but even non-parameter theories recognize the fact of variation and aim to explain how distinct Gs can be acquired on the basis of PLD.

Indeed, one of the standard arguments for some cognitive invariance (i.e. UG) arises from the fact that despite all the attested variation among particular Gs, they have many properties in common. Comparative syntax and the study of variation has been the source of some of the strongest arguments in favor of postulating a rich domain specific UG. In short, the problem from the outset has been to explain both the invariance and the variation. Given all of this, Prinz’s suggestions that rationalists ignore variation is simply mystifying.[3]

Moreover, he seems ignorant of the fact that to date this is really the only game in town. Prinz is staking a lot on the new statistical learning techniques to supply the requisite mechanisms for his empiricism. However, to date, purely statistical approaches have had rather modest success. This is not to say that stats are useless. They are not. But they are not the miracle drug that Prinz seems to assume they are.

This emerges rather clearly in his discussion of that old chestnut, the poverty of the stimulus argument (POS) using the only example that non-linguists seem to understand, polar questions. Sadly, Prinz’s presentation of the POS demonstrates once again how subtle the argument must be for he clearly does not get it. The problem (as Paul Pietroski went over in detail here and that I reviewed again here) is to explain constrained homophony (i.e. the existence of systematic gaps in sound-meaning pairings). It is not to explain how to affix stars, question marks and other diacritics to sentences (i.e. not how to rank linguistic items along an acceptability hierarchy). There has been a lot of confusion on this point and it has vitiated much of the criticism of Chomsky’s original argument. The confusion likely stems from the fact that whereas an acceptability hierarchy is a standard byproduct of a theory of constrained homophony, the converse is not true, i.e. a theory of acceptability need not say much about the origins of constrained homophony. But as the question of interest is how to relate sound and meaning (viz. the generative procedures relating them), simply aiming to distinguish acceptable from unacceptable sentences is to aim in the wrong direction.

Why is this important? Because of the myriad dumb critiques of Chomsky’s original POS argument that fail precisely because they misconstrue the explanadum. The poster child of this kind of misunderstanding is Reali and Christiansen (R&C), which, of course, Prinz points to as providing a plausible statistical model for language acquisition. As Prinz notes (2513), P&C’s analysis counts bigram and trigram word frequencies and from just so counting, is able to discriminate (1) from (2).

(1) Is the puppy that is barking angry?

(2) Is the puppy barking is angry?

Prinz is delighted with this discovery. As he says:

This is an extremely important finding. By their second birthday, children have heard enough sentences to select between grammatical and ungrammatical questions even when they are more complex than the questions they have heard (loc 2513).

The problem however is that even if this is correct, the R&C proposal answers the wrong question. The question is why can’t kids form sentences like (2) with the meaning “is it the case that the angry puppy is barking” on analogy with (1)’s meaning “is it the case that the barking puppy is angry”? This is the big fact. And it exists quite independently of the overall acceptability of the relevant examples. Thus (3) carries only the meaning we find in (1), not (2) (i.e. (3) cannot mean “is it the case that the puppy that barked was the one that Bill kissed.”).

(3) Did the puppy Bill kissed bark

This is the same fact that (1) and (2) discuss but with no unacceptable string to account for, i.e. no analogue of (2). Bigrams and trigrams are of no use here. What we need is a rule relating form to meaning and an explanation of why some conceivable rules are absent resulting in the inexpressibility of some meanings by some sentences. Unfortunately for Prinz, R&C’s proposals don’t even address this question let alone provide a plausible answer.

Why do Prinz and R&C so completely misunderstand what needs explaining. I cannot be sure, but here is a conjecture. They confuse data coverage with using data to probe structure. For Chomsky, the contrast between (1) and (2) results from the fact that (1) can be generated by a structure dependent grammar while (2) cannot be. In other words, these differences in acceptability reflect differences in possible generative procedures. It is generative procedures that are the objects of investigation not the acceptability data. As Cartwright argued (see here), empiricists are uncomfortable with the study of underlying powers/structures, viz. here the idea that there are mental powers with their own structural requirements. Empiricists confuse what something does with what it is. This confusion is clearly at play here with the same baleful effects that Cartwright noted are endemic to empiricist conceptions of scientific explanation.

I could go on sniping at Prinz’s misunderstandings and poor argumentation. And I think I will do so to make two more points.

First, Prinz really seems to have no idea how poor standard empiricist accounts have been. Classical associationist theories have been deeply unsuccessful. I want to emphasize this for Prinz sometimes leaves the impression that things are not nearly so hopeless. They are. And not only in the study of humans, but in mammal cognition quite generally.

Gallistel is the go-to guy on these issues (someone that I am sure that Prinz has heard of after all he just teaches across the bridge at Rutgers). He and King review some of the shortcomings in Memory and the Computational Brain, but there is a more succinct recapitulations of the conceptual trials and empirical tribulations of the standard empiricist learning mechanisms in a recent paper (here). It’s not pretty. Not only are there a slew of conceptual problems (e.g. how to deal with the effects of non-reinforcement (69)), but the classical theories fail to explain much at all. Here’s Gallistel’s conclusion (79):

Associationist theories have not explained either the lack of effect of partial reinforcement on reinforcements to acquisition or the extinction-prolonging effect of partial reinforcement. Nor have they explained spontaneous recovery, reinstatement, renewal and resurgence except by ad hoc parametric assumptions…I believe these failures derive from the failure to begin with a characterization of the problem that specific learning mechanisms and behavioral systems are designed to solve. When one takes an analysis of the problems as one’s point of departure…insights follow and paradoxes dissolve. This perspective tends, however, to lead the theorist to some version of rationalism, because the optimal computation will reflect the structure of the problem, just as the structure of the eye and the ear reflect the principles of optics and acoustics.

Gallistel’s arguments hinge on trying to understand the detailed mechanisms underlying specific capacities. It’s when the rubber hits the road that the windy generalities of empiricism start looking less than helpful. Sadly, Prinz never really gets down to discussions of mechanisms, i.e. he refuses to play by the rules. Maybe it’s the philosopher in him.

So what does Prinz do instead? He spends a lot of time discussing methodological issues that he hopes will topple the main results. For example, he discusses how difficult it can be to interpret eye gaze, the standard measure used in infant and toddler studies (1547). Eye gaze can be hard to interpret. What change it is indexing can be unclear. Sometimes it indexes stimulus similarity other times novelty. Sometimes it is hard to tell if it’s tracking a surface change in stimulus or something deeper. And that’s why people who use eye gaze measures try to determine what eye gaze duration is actually indexing in the particular context in which it’s being used. That’s part of good experimental design in these areas. I know this because this is extensively discussed in the lab meetings I sit in on (thanks Jeff) whenever eye gaze duration is used to measure knowledge in the as yet inarticulate. The technique has been used for a long long time. Hence its potential pitfalls are well known and for precisely this reason it is very unlikely that all the work that uses it will go down the intellectual drain for the trivial methodological reasons that Prinz cites. To put it bluntly, Baillargeon, Carey, Spelke, Wynn etc. are not experimentally inept. As Prinz notes, there are hundreds (thousands?) of studies using this technique that all point in the same rationalist direction. However blunt a measure eye-gaze is, the number of different kinds of experiments all pointing to the same conclusion is more than a little suggestive. If Prinz wants to bring this impressive edifice crashing down, he needs to do a lot more than note what is common knowledge, viz. that eye gaze needs contextual interpretation.

And of course, Prinz knows this. He is not aiming for victory. He is shooting for a tie (c.f. 1512). He doesn’t want to show that rationalists wrong (just that “they don’t make their case”) and empiricists right (ok, he does want this but clearly believes this goal is out of reasonable reach), rather he wants to muddy the waters, to insinuate that there is less to the myriad rationalist conclusions than meets the eye (and there is a lot here to meet an unbiased eye), and consequently (though this does not follow as he no doubt knows) that there is more to empiricist conceptions than there appears to be. Why? Because he believes that “empiricism is the more economical theory” (1512) and should be considered superior until rationalist prove they are right.

This strategy, aside from setting a very low bar for empiricist success, conveniently removes the necessity of presenting alternative accounts or mechanisms for any phenomena of interest. Thus, whereas rationalists try to describe human cognitive capacities and explain how they might actually arise, Prinz is satisfied with empiricist accounts that just point out that there is a lot of variation in behavior and gesture towards possible socio-environmental correlates. How this all translates into what people know or they do what they do is not something Prinz demands of empiricist alternatives. [4] He is playing for a tie, assured in the belief that this is all he needs. Why does he believe this? Because he believes that Empiricism is “a more economical theory.”

Why assume this? Why think that empiricist theories are “simpler”? Prinz doesn’t say, but here is one possible reason: domain specificity in cognition requires an account of its etiology. In other words, how did the innate structure get there (think Minimalism)? But, if this is the reason then it is not domain specificity that is problematic, but any difference in cognitive power between descendant and ancestor. Here’s what I mean.

Say that humans speak language but other animals don’t. Why? Here’s one explanation: we have domain specific structure they don’t. Here’s another, we have computational/statistical capacities they don’t. Why is the second account inherently methodologically superior to the first? The only reason I can think of is that enhanced computational/statistical capacities are understood as differences in degree (e.g. a little more memory) while domain specific structures are understood as differences in kind. The former are taken to be easy to explain, the latter problematic. But is this true?

There are two reasons to think not. Consider the language case. Here’s the big fact: there’s nothing remotely analogous to our linguistic capacities in any other animal. If this is due to just a slight difference in computing capacity (e.g. some fancier stats package, a little more memory) then we need a pretty detailed story demonstrating this. Why? Because it is just as plausible that a little less computing capacity should not result in this apparently qualitative difference in linguistic capacity (indeed, this was the motivation behind the earlier teach-the-chimps/gorillas-to-talk efforts). What we might expect is more along the following lines: slower talk, shorter sentences, fewer ‘you know’s interspersed in speech. But a complete lack of linguistic competence, why expect this? Maybe the difference really is just a little more of what was there before, but, as I said, we need a very good story to accept this. Need I say that none has been provided?

Second, there are many different ways to adding to computational/statistical power. For example, some ways of statistically massaging data are computationally far more demanding than others (e.g. it is no secret that Bayesianism if interpreted as requiring updating of all relevant alternatives is too computationally extensive to be credible and that’s why many Bayesians claim not to believe that this is possible).[5] If the alternative to domain specific structure is novel (and special) counting methods then what justifies the view that the emergence of the latter is easier to explain than the former?

Prinz’s methodological assumption here is not original. Empiricists often assume that rationalism is the more complex hypothesis. But this really depends on the details and no general methodological conclusions are warranted. Sometimes domain specific structures allow for economizing on computational resources.[6] At any rate, none of this can be adjudicated a priori. Specific proposals need to be proposed and examined. This is how the game is played. There are no methodological shortcuts to the argumentative high ground.

This post has gone on far too long. To wrap up then: there is a rising tendency to think well again of ideas that have been deservedly buried. Prinz is the latest herald of the resurrection. However, empiricist conceptions of the mind should be left to mold peacefully with other discredited ideas: flat earth, epicycles, and phlogiston. Whatever the merits of these ideas may once have been (these least three, did have some once) they are no longer worth taking seriously. So too with classical empiricism, as Prinz’s book ably demonstrates.

[1] I’m going to drop the naturism/nurturism lingo and just return to the conventional empiricism/rationalism labels.

[2] All references are to the Kindle version.

[3] I’m no expert in these areas but it seems obvious to me that the same can be said about most work in cognitive psychology. Actually, here, if anything, the obsession with dealing with individual differences (aka cognitive variation) has retarded the search for invariances. In the last decades this has been partly assuaged. Baillargeon, Spelke, Carey, Gleitman a.o. have pursued research strategies logically analogous to the one described above in generative grammar.

[4] I should be careful here. I am discussing capacities, but Prinz mainly trucks in behavior loc 112-132). Most rationalists aim to understand the structure of mental capacities, not behavior. What someone does is only partially determined by his/her mental capacities. Behavior, at least for linguists of the generative variety, is not an object of study, at least at present (and in my own view, never). I am going to assume that Prinz is also interested in capacities, though if he is not then his discussion is irrelevant to most of what rationalists are aiming to understand.

[5] See here, here and here for discussion.

[6] Berwick had an excellent presentation to this effect at the recent LSA meeting in Boston. I’ll see if I can get the slides and post them.

29 comments:

Tim HunterFebruary 5, 2013 at 9:31 PM
As regards the question of why the POS argument is so frequently misunderstood, I have to admit that I think we linguists could still do a better job of explaining it. As I see it the crucial point is this:
The problem ... is to explain constrained homophony (i.e. the existence of systematic gaps in sound-meaning pairings). It is not to explain how to affix stars, question marks and other diacritics to sentences.

But to my mind the following way of putting things, although relatively common/standard, muddies the waters a bit:
They confuse data coverage with using data to probe structure. For Chomsky, the contrast between (1) and (2) results from the fact that (1) can be generated by a structure dependent grammar while (2) cannot be. In other words, these differences in acceptability reflect differences in possible generative procedures. It is generative procedures that are the objects of investigation not the acceptability data.
This second quote seems to suggest that while the likes of R&C do in fact account for the acceptability data, there's something else besides the data that they are missing, or they're accounting for the data in some irrelevant or unsatisfying way.

I think the more straightforward way to put things is simply to say that "acceptability" is a property of sound-meaning pairings, not sounds (or strings) alone. Then the acceptability data to account for has the form: these string-meaning pairings are acceptable (i.e. this string goes with that meaning, etc.), and these other string-meaning pairings are not. (The best concrete cases to flesh out this way might be Paul's examples about waffles and parking meters and lost hikers.) No doubt this is what every good linguist has in mind, in some sense, when they make the argument; I'm not trying to say anything new. But this is much simpler, I think, than the usual angle which invites the misunderstanding that getting the right set of strings is "the data", but pairing them with the right meanings is part of "the way you get the data", and there are "good ways" and "bad ways" to get the data.

The fact that we are "using data to probe structure", and that "generative procedures ... are the object of investigation not the acceptability data", is of course relevant, but I think bringing up these issues just makes the objection to string-classifiers seem more philosophical or aesthetic than necessary. A clearer and blunter way to put it is simply that if the data take the form of classifications of string-meaning pairs, then a string-classifier doesn't cover the data.

(To be honest I actually think that quite generally, the notion of acceptability as a property of strings might well do more harm than good. The only thing it seems to mean is derived from the acceptability of string-meaning pairs, i.e. we could define s to be string-acceptable iff there is some m such that (s,m) is acceptable. But we don't find much use for the mirror image notion of meaning-acceptable (i.e. there is some s such that (s,m) is acceptable), and is there reason to think that string-acceptability is any more relevant?)
ReplyDelete
Replies
VilemKodytekFebruary 7, 2013 at 6:37 AM
Interesting post and discussion!
It leads me to the question whether an A^nB^n grammar can tell neuroscientists anything at all about processing natural language.
ReplyDelete
Replies
AveryAndrewsFebruary 14, 2013 at 3:21 PM
I think one thing that might be useful is for the syntax community to start classifying cases where absence 'easily' counts as evidence, vs those where it does so less easily (in terms of the sophistication of the statistical technology that the learner would need to use in order to interpret the absence as evidence). The cases I class as 'easy' have the following two characteristics:

a) they arise frequently (blocking of regular verb formation rules by irregular forms; basic binding theory)

b) they involve the suppression of one conceivable way of expressing a meaning by another, where a meaning is construed of as a collection of lexical meanings plus a scheme of semantic composition. (blocking & binding, again). What makes this 'easy' is that when you hear something and understand it, you can run your grammar in generation mode to find out what other ways it's providing to express the meaning, and if it's possible to change the grammar to reduce the number of alternates without failing to parse existing utterances, you can tighten up your grammar (this is a grammar tightening/debugging method in the LFG/XLE community).

In addition to blocking and binding, other easy cases would be the dative alternation, pro-drop, clitic doubling (but not constraints on clitic combinations), and basic word order contraints (for people who think they are produced by LP constraints added ID rules).

Not easy would include:

1) some of the more complex structures that figure in the Icelandic syntax literature (I conjecture that I might have invented the very first ones ever to have been produced where a secondary adjective predicate shows case agreement with a quirky case-marked PRO subject (hún vonast til að vanta ekki eina/*ein í tímanum) - $20 reward for anyone who produces a naturally occurring example prior to 1976). Hard due to extreme rarity.

2) island constraints: hard since the blocked structures often require considerable reformulation to express their meaning (as pointed out by Fodor & Crain wrt the Uniqueness Principle in their chapter in McWhinney (1987).

3) no complex possessors in Piraha: hard because a complex periphrasis involving two clauses is needed to express the meaning of the blocked construction. Easy otoh would be the restrictions on prenominal possessors in German, since semantically equivalent postnominal ones are also available.

Intermediate and unclear cases would include that-trace and wanna contraction, since Gathercole & Zukowski, respectively, have found the relevant data to be quite rare in the input of young children, at least, but they're certainly very common relative to the Icelandic agreement with QC PRO subject ones.
ReplyDelete
Replies
AveryAndrewsFebruary 14, 2013 at 3:24 PM
Oh yes, and the bottom line point being that alleged UG principles whose support comes only from easy (and probably intermediate) cases need to be regarded skeptically until or unless they can get solid support from typology. I'd consider structure dependence of Aux-inversion as an intermediate case that does get the needed support from typology.
ReplyDelete
Replies
AveryAndrewsFebruary 16, 2013 at 4:04 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
AveryAndrewsFebruary 16, 2013 at 4:14 PM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
AveryAndrewsFebruary 16, 2013 at 11:36 PM
So I started a blog about it:

http://innatelysyntactic.wordpress.com/

I think it will take some time to work through the details.

[if you can seamlessly remove the two posts above plus this, that might be good]
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Tuesday, February 5, 2013

There's No There There

29 comments:

Contributors