Faculty of Language: May 2013

Monday, May 27, 2013

Showbiz and Bullshit

A while ago I discussed the rising incentive for BS in academic life. Something that I failed to emphasize is the way that our leading "science" journals have morphed into PR firms, where scoops are the valued currency. The leading science journals (Science, Nature, PNAS) embargo dissemination of information from forthcoming publications until the release date. The released papers are often part of large PR rollouts intended to wow the public (a kind of PR shock and awe). It all has the markings of publicity for a new Dan Brown blockbuster or a hot new hollywood summer movie. What it has little in common with are the virtues that scientists like to claim for themselves and their enterprise.

A recent post by Richard Sproat (here) provides an illustration (thx to David Pesetsky for bringing it to my attention). What Sproat describes is a typical case of protecting your brand. BS gets published and nothing that calls it for what it is gets a hearing. Why not? It will tarnish the Science brand. What would the journal be worth if it became clear that a good chunk of what it published was of little intellectual value? This is a rhetorical question, btw.

There has been a lot of hand wringing over fraud in science. I've talked about this some and evinced my skepticism about how serious this is, especially in cases where it appears that "fraudulent" results replicate (as in Hauser's case, to name an important recent "problem"). Read the Sproat piece and consider which is worse: the systematic suppression of criticism or scientific fraud? Which is systematized? Which more completely pollutes the data stream?

Not everything in our leading journals is BS.[1] I would bet that most is not. However, enough just might be to undermine our confidence in the seriousness of our leading publications. BS is very hard to combat, especially when our journals see themselves as part of the entertainment industry. Are Science and Nature becoming Variety with formulae? At least when discussing language issues, it looks like it might be.

[1] This does not mean that all material published on these topics in these journals is BS (see for example the excellent paper by Idsardi and Heinz (here).

Saturday, May 25, 2013

Formalization and Falsification in Generative Grammar

There's been a very vigorous discussion in the comment sections to this post which expatiates on the falsifiability of proposals within generative grammar of the Chomskyan variety. It interweaves with a second point: the value of formalization. The protagonists are Alex Clark and David Pesetsky. It should come as no surprise to readers that I agree with David here. However, the interchange is worth reading and I recommend it to your attention precisely because the argument is a "canonical" one in the sense that they represent two views that about the generative enterprise that we will surely hear again (though I hope that very soon David's views prevail, as they have with me).

Though David has said what I would have said (though much better) let me add three points.

First, nobody can be against formalization. There is nothing wrong with it (though there is nothing inherently right about it either). However, in my experience its value lies not in making otherwise vague theories testable. Indeed, as we generally evaluate a particular formalization in terms of whether it respects the theoretical and empirical generalizations of the account it is formalizing, it is hard to see how formalization per se can be the feature that makes an account empirically evaluable. This comes out very clearly, for example, in the recent formalization of minimalism by Collins and Stabler. At virtually every point they assure the reader that a central feature of the minimalist program is being coded in such and such a way. And, in doing this, they make formal decisions with serious empirical and theoretical consequences and that could call into question the utility of the particular formalization. For example, the system does not tolerate sidewards movement and whether this formalization is empirically and theoretically useful may rest on whether UG allows sidewards movement or not. But the theoretical and empirical adequacy of sidewards movement and formalizations that encode it or not is not a question that any given proposed formalization addresses nor can address (as Collins and Stabler know). So, whatever, the utility of formalization, to date with some exceptions (I will return to two), it does not primarily lie in making theories that would otherwise be untestable, testable.

So what is the utility? I think that when done well, formalization allows us to clarify the import of our basic concepts. It can lay bear what the conceptual dependencies between our basic concepts are. Tim Hunter's thesis is a good example of this, I think. His formalization of some basic minimalist concepts allows us to reconceptualize them and consequently extend them empirically (at least in principle). So too with Alex Drummond's formal work on sidewards movement and Merge over Move. I mention these two because this work was done here at UMD and I was exposed to the thinking as it developed. It is not my intention to suggest that there is not other equally good work out there.

Second, any theory comes to be tested only with the help of very many ancillary hypotheses. I confess to feeling that lots of critics of Generative Grammar would benefit by reading the work criticizing naive falsificationsim (Lakatos, Cartwright, Hacking, and a favorite of mine, Laymon). As David emphasizes, and I could not agree more, it is not that hard to find problems with virtually every proposal. Given this, the trick is to evaluate proposals despite their evident shortcomings. The true/false dichotomy might be a useful idealization within formal theory, but it badly distorts actual scientific practice where the aim is to find better theories. We start from the reasonable assumption that our best theories are nonetheless probably false. We all agree that the problems are hard and that we don't know as much as we would like. Active research consists in trying to find ways of evaluating these acknowledged false accounts so that we can develop better ones. And where the improving ideas will come from is often quite unclear. Let me give a couple of example of how vague the most progressive ideas can be.

Consider the germ theory of disease. What is it? It entered as roughly the claim that some germs cause some diseases sometimes. Not one of those strongly refutable claims. Important. You bet. It started people thinking in entirely new ways and we are all the beneficiaries of this.

The Atomic Hypothesis is in the same ball park. Big things are made up of smaller things. This was an incredibly important idea (Feynman, I think, thought this was the most important scientific idea ever). Progress comes from many sources, formalization being but one. And even pretty labile theories can be tested, as, e.g. the germ theory was.

Third: Alex Clark suggests that only formal theories can address learnability concerns. I disagree. One can provide decent evidence that something is not learnable without this (think of Crain's stuff or the conceptual arguments against the learnability of island conditions). This is not to dispute that formal accounts can and have helped illuminate important matters (I am thinking of Yang's stuff in particular, but a lot of stuff done by Berwick and his students are, IMO, terrific). However, I confess that I would be very suspicious of formal learnability results that "proved" that Binding Theory was learnable, or that Movement locality theory (aka Subjacency) was or that ECP or structure dependence was. The reasons for taking these phenomena as indictions of deep grammatical structural principles is so convincing (to me) that they currently form boundary conditions on admissible formal results.

As I said, the discussion is worth reading. I suspect that minds will not be changed, but this does not make going through this (at least once anyhow) worthwhile.

Tuesday, May 21, 2013

What Neuroscience knows about behavior

Neuroscientists win Nobels. They get years of the brain, presidents asking for billions of dollars for connectomes and brain atlases, and billion dollar grants (see here) to build computer brains to find out how we think. Neuroscience is a prestige science. Sadly, linguistics is not.

The paper noted above is the latest indication of the cachet that neuroscience has. However, buried in the article discussing this latest funding coup (btw, I have nothing against this, though I am envious, for none of this money would ever have come to me and better this than another fancy jet or tank) is an indication of how little contemporary neuroscience can tell us about how brains affect behavior or mental capacity. And not because we don't have a full fledge connectome or map of the brain. Consider the lowly roundworm: full wiring diagram and no idea why it does what it does. Don't take my word for this. Here's Christoph Koch one of the leaders in the field:

“There are too many things we don’t yet know,” says Caltech professor Christof Koch, chief scientific officer at one of neuroscience’s biggest data producers, the Allen Institute for Brain Science in Seattle. “The roundworm has exactly 302 neurons, and we still have no frigging idea how this animal works.”

So, next time a neuroscientist tells you that linguistic representations cannot be right because they are incompatible with what we know about brains, worry not. We don't seem to know much about brains, at least where it counts: coupling the structure of brains to what we (or even roundworms) do.

Monday, May 20, 2013

Evans-Levinson: the sound and the fury

I confess that I did not read the Evans and Levinson article (EL) (here) when it first came out. Indeed, I didn’t read it until last week. As you might guess, I was not particularly impressed. However, not necessarily for the reason you might think. What struck me most is the crudity of the arguments aimed at the Generative Program, something that the (reasonable) commentators (e.g. Baker, Freidin, Pinker and Jackendoff, Harbour, Nevins, Pesetsky, Rizzi, Smolensky and Dupoux a.o.) zeroed in on pretty quickly. The crudity is a reflection, I believe, of a deep seated empiricism, one that is wedded to a rather superficial understanding of what constitutes a possible “universal.” Let me elaborate.

EL adumbrates several conceptions of universal, all of which the paper intends to discredit. EL distinguishes substantive universals from structural universals and subdivides the latter into Chomsky vs Greenberg formal universals. The paper’s mode of argument is to provide evidence against a variety of claims to universality by citing data from a wide variety of languages, data that EL appears to believe, demonstrate the obvious inadequacy of contemporary proposals. I have no expertise in typology, nor am I philologically adept. However, I am pretty sure that most of what EL discuss cannot, as it stands, broach many of the central claims made by Generative Grammarians of the Chomskyan stripe. To make this case, I will have to back up a bit and then talk on far too long. Sorry, but another long post. Forewarned, let’s begin by asking a question.

What are Generative Universals (GUs) about? They are intended to be in the first instance, descriptions of the properties of the Faculty of Language (FL). FL names whatever it is that humans have as biological endowment that allows for the obvious human facility for language. It is reasonable to assume that FL is both species and domain specific. The species specificity arises from the trivial observations that nothing does language like humans do (you know: fish swim, birds fly, humans speak!). The domain specificity is a natural conclusion from the fact that this facility arises in all humans pretty much in the same way independent of other cognitive attributes (i.e. both the musical and the tone deaf, both the hearing impaired and sharp eared, both the mathematically talented and the innumerate develop language in essentially the same way). A natural conclusion from this is that humans have some special features that other animals don’t as regards language and that human brains have language specific “circuits” on which this talent rests. Note, this is a weak claim: there is something different about human minds/brains on which linguistic capacity supervenes. This can be true even if lots and lots of our linguistic facility exploits the very same capacities that underlie other forms of cognition.

So there is something special about human minds/brains as regards language and Universals are intended to be descriptions of the powers that underlie this facility; both the powers of FL that are part of general cognition and those unique to linguistic competence. Generativists have proposed elaborating the fine structure of this truism by investigating the features of various natural languages and, by considering their properties, adumbrating the structure of the proposed powers. How has this been done? Here again are several trivial observations with interesting consequences.

First, individual languages have systematic properties. It is never the case that, within a given language, anything goes. In other words, languages are rule governed. We call the rules that govern the patterns within a language a grammar. For generativists, these grammars, their properties, are the windows into the structure of FL/UG. The hunch is that by studying the properties of individual grammars, we can learn about that faculty that manufactures grammars. Thus, for a generativist, the grammar is the relevant unit of linguistic analysis. This is important. For grammars are NOT surface patterns. The observables linguists have tended to truck in relate to patterns in the data. But these are but way stations to the data of interest: the grammars that generate these patterns. To talk about FL/UG one needs to study Gs. But Gs are themselves inferred from the linguistic patterns that Gs generate, which are themselves inferred from the natural or solicited bits of linguistic productions that linguists bug their friends and collaborators to cough up. So, to investigate FL/UG you need Gs and Gs should not be confused with their products/outputs, only some of which are actually perceived (or perceivable).

Second, as any child can learn any natural language, we are entitled to conclude from the intricacies of any given language to powers of FL/UG capable of dealing with such intricacies. In other words, the fact that a given language does NOT express property P does not entail that FL/UG is not sensitive to P. Why? Because a description of FL/UG is not an account of any given language/G but an account of linguistic capacity in general. This is why one can learn about the FL/UG of an English speaker by investigating the grammar of a Japanese speaker and the FL/UG of both by investigating the grammar of a Hungarian, or Swahili, or Slave speaker. Variation among different grammars is perfectly compatible with invariance in FL/UG, as was recognized from the earliest days of Generative Grammar. Indeed, this was the initial puzzle: find the invariance behind the superficial difference!

Third, given that some languages display the signature properties of recursive rule systems (systems that can take their outputs as inputs), it must be the case that FL/UG is capable of concocting grammars that have this property. Thus, whatever G an individual actually has, that individual’s FL/UG is capable of producing a recursive G. Why, because that individual could have acquired a recursive G even if that individual’s actual G does not display the signature properties of recursion. What are these signature properties? The usual: unboundedly large and deep grammatical structures (i.e. sentences of unbounded size). If a given language appears to have no upper bound on the size of its sentences, then it's a sure bet that the G that generates the structures of that language is recursive in the sense of allowing structures of type A as parts of structures of type A. This, in general will suffice to generate unboundedly big and deep structures. Examples for this type of recursion include conjunction, conditionals, embedding of clauses as complements of propositional attitude verbs, relative clauses etc. The reason that linguists have studied these kinds of configurations is precisely because they are products of grammars with this interesting property, a property that seems unique to the products of FL/UG, and hence capable of potentially telling us a lot about the characteristics of FL/UG.

Before proceeding, it is worth noting that the absence of these noted signature properties in a given language L does not imply that a grammar of L is not basically recursive. Sadly, FL seems to leap to this conclusion (443). Imagine that for some reason a given G puts a bound of 2 levels of embedding on any structure in L. Say it does this by placing a filter (perhaps a morphological one) on more complex constructions. Question: what is the correct description of the grammar of L? Well, one answer is that it does not involve recursive rules for, after all, it does not allow unbounded embedding (by supposition). However, another perfectly possible answer is that it allows exactly the same kinds of embedding that English does modulo this language specific filter. In that case the grammar will look largely like the ones that we find in languages like English that allow unbounded embedding, but with the additional filter. There is no reason just from observing that unbounded embedding is forbidden to conclude that the grammar of this hypothetical language L (aka Kayardild or Piraha) has a grammar different in kind from the grammars we attribute to English, French, Hungarian, Japanese etc. speakers. In fact, there is reason to think that the Gs that speakers of this hypothetical language have does in fact look just like English etc. The reason is that FL/UG is built to construct these kinds of grammars and so would find it natural to do so here as well. Of course L would seem to have an added (arbitrary) filter on the embedding structures, but otherwise the G would look the same as the G of more familiar languages.

An analogy might help. I’ve rented cars that have governors on the accelerators that cap speed at 65 mph. The same car without the governor can go far above 90 mph. Question: do the two cars have the same engine? You might answer “no” because of the significant difference in upper limit speeds. Of course, in this case, we know that the answer is “yes”: the two cars work in virtually identical ways, have the very same structures but for the governor that prevents the full velocity potential of the rented car from being expressed. So, the conclusion that the two cars have fundamentally different engines would be clearly incorrect. Ok: swap Gs for engines and my point is made. Let me repeat it: the point is not that the Gs/engines might be different in kind, the point is that simple observation of the differences does not license the conclusion that they are (viz. you are not licensed to conclude that they are just finite state devices because they don’t display the signature features of unbounded recursion, as EL seems to). And, given what we know about Gs and engines the burden of proof is on those that conclude from such surface differences to deep structural differences. The argument to the contrary can be made, but simple observations about surface properties just doesn’t cut it.

Fourth, there are at least two ways to sneak up on properties of UGs: (i) collect a bunch and see what they have in common (what features do all the Gs display) and (ii) study one or two Gs in great detail and see if their properties could be acquired from input data. If any could not be, then these are excellent candidate basic features of FL/UG. The latter, of course, is the province of the POS argument. Now, note that as a matter of logic the fact that some G fails to have some property P can in principle falsify a claim like (i) but not one like (ii). Why? Because (i) is the claim that every G has P, while (ii) is the claim that if G has P then P is the consequence of G being the product of FL/UG. Absence of P is a problem for claims like (i) but, as a matter of logic, not for claims like (ii) (recall, If P then Q is true if P is false). Unfortunately, EL seems drawn to the conclusion that PàQ is falsified if –P is true. This is an inference that other papers (e.g. Everett’s Piraha work) are also attracted to. However, it is a non-sequitur.

EL recognizes that arguing from the absence of some property P to the absence of Pish features in UG does not hold. But the paper clearly wants to reach this conclusion nonetheless. Rather than denying the logic, EL asserts that “the argument from capacity is weak” (EL’s emphasis). Why? Because EL really wants all universals to be of the (i) variety, at least if they are “core” features of FL/UG. As these type (i) universals must show up in every G if they are indeed universal, absence to appear in one grammar is sufficient to call into question its universality. EL is clearly miffed that Generativists in general and Chomsky in particular would hold a nuanced position like (ii). EL seems to think that this is cheating in some way. Why might they hold this? Here’s what I think.

As I discussed extensively in another place (here), everyone who studies human linguistic facility appreciates that competent speakers of a language know more than they have been exposed to. Speakers are exposed to bits of language and from this acquire rules that generalize to novel exemplars of that language. No sane observer can dispute this. What’s up for grabs is the nature of the process of generalization. What separates empiricists from rationalists conceptions of FL/UG is the nature of these inductive processes. Empiricists analyze the relevant induction as a species of pattern recognition. There are patterns in the data and these are generalized to all novel cases. Rationalists appreciate that this is an option, but insist that there are other kinds of generalizations, those based on the architectural properties (Smolensky and Dupoux’s term) of the generative procedures that FL/UG allow. These procedures need not “resemble” the outputs they generate in any obvious way and so conceiving this as a species of pattern recognition is not useful (again, see here for more discussion). Type (ii) universals fit snugly into this second type, and so empiricists won’t like them. My own hunch is that an empiricist affinity for generalizations based on patterns in the data lies behind EL’s dissatisfaction with “capacity” arguments; they are not the sorts of properties that inspection of cases will make manifest. In other words, the dissatisfaction is generated by Empiricist sympathies and/or convictions which, from where I sit, have no defensible basis. As such, they can be and should be discounted. And in a rational world they would be. Alas…

Before ending, let me note that I have been far too generous to the EL paper in one respect. I said at the outset that its arguments are crude. How so? Well, I have framed the paper’s main point as a question about the nature of Gs. However, most of the discussion is framed not in terms of the properties of Gs they survey but in terms of surface forms that Gs might generate. Their discussion of constituency provides a nice example (441). They note that some languages display free word order and conclude from this that they lack constituents. However, surface word order facts cannot possibly provide evidence for this kind of conclusion, it can only tell us about surface forms. It is consistent with this that elements that are no longer constituents on the surface were constituents earlier on and were then separated, or will become constituents later on, say on the mapping to logical form. Indeed, in one sense of the term constituent, EL insists that discontinuous expressions are such for they form units of interpretation and agreement. The mere fact that elements are discontinuous on the surface tells us nothing about whether they form constituents at other levels. I would not mention this were it not the classical position within Generative Grammar for the last 60 years. Surface syntax is not the arbiter of constituency, at least if one has a theory of levels, as virtually every theory that sees grammars as rules that relate meaning with sounds assumes (EL assumes this too). There is nary a grammatical structure in EL and this is what I meant be my being overgenerous. The discussion above is couched in terms of Gs and their features. In contrast, most of the examples in EL are not about Gs at all, but about word strings. However, as noted at the outset, the data relevant to FL/UG are Gs and the absence of Gish examples in EL makes most of EL’s cited data irrelevant to Generative conceptions of FL/UG.

Again, I suspect that the swapping of string data for G data simply betrays a deep empiricism, one that sees grammars as regularities over strings (string patterns) and FL/UG as higher order regularities over Gs. Patterns within patterns within patterns. Generativists have long given up on this myopic view of what can be in FL/UG. EL does not take the Generative Program on its own terms and show that it fails. It outlines a program that Generativists don’t adopt and then shows that it fails by standards it has always rejected using data that is nugatory.

I end here: there are many other criticisms worth making about the details, and many of the commentators of the EL piece better placed than me to make them do so. However, to my mind, the real difficulty with EL is not at the level of detail. EL’s main point as regards FL/UG is not wrong, it is simply besides the point. A lot of sound and fury signifying nothing.

Wednesday, May 15, 2013

What's a result?

I recently read an intellectual biography of Feynman by Lawrence Krauss (here) and was struck by the following contrast between physics and linguistics. In linguistics, or at least in syntax, if a paper covers the very same ground as some previously published piece of work, it is considered a failure. More exactly, unless a paper can derive something novel (admittedly, the novelty can be piddling), preferably something that earlier alternatives cannot (or do not[1]) get, the paper will have a hard time getting published. Physicists, in contrast, greatly value research that derives/explains already established results/facts in novel ways. Indeed, one of Feynman’s great contributions was to recast classical quantum mechanics (in terms of the Schrodinger equation) in terms of Lagrangians that calculate probability amplitudes. At any rate, this was considered an important and worthwhile project and it led, over time, to whole new ways of thinking about quantum effects (or so Krauss argues). If Feynman had been a syntactician he would have been told that simply re-deriving Schrodinger equation is not in and of itself enough: you also have to show that the novel recasting could do things that the classical equation could not. I can hear it now: “As you simply rederive the quantum effects covered by the Schrodinger equation, no PhD for you Mr Feynman!”

Now, I have always found this attitude within linguistics/syntax (ling-syn) rather puzzling. Why is deriving a settled effect in a different way considered so uninteresting? At least in ling-syn? Consider what happens in our “aspirational peers” in, say, math. There are about a hundred proofs of the Pythagorean theorem (see here) and, I would bet, that if someone came up with another one tomorrow it could easily get published. Note, btw, we already know that the square of the hypotenuse of a right angles triangle is equal to the sum of the squares of the other two sides (in fact we’ve known this for a very long time), and nonetheless, alternative proofs of this very well known and solid result are still noteworthy, at least to mathematicians. Why? Because, what we want from a proof/explanation involves more than the bottom (factual) line. Good explanations/proofs show how fundamental concepts relate to one another. They expose the fine structure and the fault lines of the basic ideas/theories that we are exploring. Different routes to the same end not only strengthen our faith in the correctness of the derived fact(oid) they also, maybe more importantly, demonstrate the inner workings of our explanatory apparatus.

Interestingly, it is often the proof form rather than the truth of the theorem that really matters. I recall dimly that when the four color problem was finally given a brute force computer solution by cases, NPR interviewed a leading topologist who commented that the nature of the proof indicated that the problem was not as interesting as had been supposed! So, that one can get to Rome is interesting. However, no less interesting is the fact that one can get there in multiple ways. So, even if the only thing a novel explanation explains is something that has been well explained by another extant story, the very fact that one can get there both from varying starting points is interesting and important. It is also fun. As Feynman put it: “There is a pleasure in recognizing old things from a new viewpoint.” But, for some reason, my impression is that the ling-syn community finds this unconvincing.

The typical ling-syn paper is agonistic. Two (or more) theories are trotted out to combat one another. The accounts are rhetorically made to face off and data is thrown at them until only one competitor is left standing, able to “cover the facts.” In and of itself, trial by combat need not be a bad way to conduct business. Alternatives often mutually illuminate by being contrasted, and comparison can be used to probe the inner workings so that the bells and whistles that make each run can be better brought into focus.

However, there is also a downside to this way of proceeding. Ideas have an integrity of their own which support different ways of packaging thoughts. These packages can have differing intellectual content and disparate psychological powers. Thus, two accounts that get all the same effects might nonetheless spur the imagination differently and, for example, more or less easily suggest different kinds of novel extensions. Having many ways of conceptualizing a problem, especially if they are built from (apparently) different building blocks (e.g. operations, first principles, etc.) may all be worth preserving and developing even if one seems (often temporarily) empirically superior. The ling-syn community suffers from premature rejection; the compulsion to quickly declare a single winner. This has the side effect of entrenching previous winners and requiring novel challengers to best them in order to get a hearing.

Why is the ling-syn community so disposed? I’m not sure, but here is a speculation. Contrary to received opinion, ling-syns don’t really value theory. In fact, until recently there hasn’t been much theory to speak of. Part of the problem is that ling-syns confuse ‘formal’ with ‘theoretical.’ For example, there is little theoretical difference between many forms of GPSG, HPSG, LFG, RG, and GB, though you’d never know this from the endless discussions over “framework” choice. The difference one finds here are largely notational, IMO, so there is not room for serious theoretical disagreement.

When this problem is finessed, a second arises. There is still in generative linguistics a heavy premium on correct description. Theory is tolerated when it is useful for describing the hugely variable flora and fauna that we find in language. In other words, theory in the service of philology is generally acceptable. Theory in the service of discovering new facts is also fine. But too much of an obsession with the workings of the basic ideas (what my good and great friend Elan Dresher calls “polishing the vessels”) is quite suspect, I believe. As ‘getting there in different ways’ is mainly of value in understanding how our theoretical concepts fit together (i.e. is mainly of theoretical/conceptual value), this kind of work is devalued unless it can also be shown to have languistic consequences.

Until recently, the baleful effects of this attitude have been meager. Why? Because ling-syn has actually been theory poor. Interesting theory generally arises when apparently diverse domains with their own apparently diverse “laws” are unified (e.g. Newtonian theory unified terrestrial and celestial mechanics, Maxwell’s unified electricity and magnetism). Until recently there were not good candidate domains for unification (Islands being the exception). As I’ve argued in other places, one feature of the minimalist program is the ambition to unify the apparently disparate domains/modules of GB, and for this we will need serious theory. And to do this we will need to begin to more highly value attempts to put ideas together in novel ways, even if for quite a long while they do no better (and maybe a tad worse) than our favorite standard accounts.

[1] The two are very different. Data is problematic when inconsistent with the leading ideas of an account. These kinds of counter-examples are actually pretty hard to concoct.

Thursday, May 9, 2013

Does NPR care about credibility? Apparently not!

NPR (here) reports on recent research purporting to show that there is no faculty of language. What's the new evidence? Well, our new fancy shmancy imaging techniques allow us to look directly into brains and so we went to look for FL and, guess what? We failed to find it! Ergo, no "special module for language." In what did the failure consist? Here's the choice passage:

"But in the 1990s, scientists began testing the language-module theory using "functional" MRI technology that let them watch the brain respond to words. And what they saw didn't look like a module, says Benjamin Bergen, a researcher at the University of California, San Diego, and author of the book Louder Than Words.

"They found something totally surprising," Bergen says. "It's not just certain specific little regions in the brain, regions dedicated to language, that were lighting up. It was kind of a whole-brain type of process." "

So, the whole brain lights up when you hear a sentence and so there is no language module. Well, when you drive to Montreal from DC the whole car moves so it cannot have a fuel system right? What's to be done? Thankfully, Greg Hickok (quickly) walks us through the morass (here). And yes, the confusion is breath taking. Maybe NPR might wish to consult a few others when reporting this sort of stuff as exciting neuroscience. Yes, colored brains sell, even on radio. But isn't the aim of NPR to inform rather than simply titillate?

Phases: some questions

One of the nice thing about conferences is that you get to bump into people you haven’t seen for a while. This past weekend, we celebrated our annual UMD Mayfest (it was on prediction in ling sensitive psycho tasks) and, true to form, one of the highlights of the get together was that I was able to talk to Masaya Yoshida (a syntax and psycho dual threat at Northwestern) about islands, subjacency, phases and the argument-adjunct movement asymmetry. At any rate, as we talked, we started to compare Phase Theory with earlier approaches to strict cyclicity (SC) and it again struck me how unsure I am that the new fangled technology has added to our stock of knowledge. And, rather than spending hours upon hours trying to figure this out solo, I thought that I would exploit the power of crowds and ask what the average syntactician in the street thinks phases have taught us above and beyond standard GB wisdom. In other words, let’s consider this a WWGBS (what would GB say) moment (here) and ask what phase wise thinking has added to the discussion. To set the stage, let me outline how I understand the central features of phase theory and also put some jaundiced cards on the table, repeating comments already made by others. Here goes.

Phases are intended to model the fact that grammars are SC. The most impressive empirical reflex of this is successive cyclic A’-movement. The most interesting theoretical consequence is that SC grammatical operations bound the domain of computation thereby reducing computational complexity. Within GB these two factors are the province of bounding theory, aka Subjacency Theory (ST). The classical ST comes in two parts: (i) a principle that restricts grammatical commerce (at least movement) to adjacent domains (viz. there can be at most one bounding node (BN) between the launch site and target of movement) and (ii) a metric for “measuring” domain size (viz. the unit of measure is the BN and these are DP, CP, (vP), and maybe TP and PP).[1] Fix the bounding nodes within a given G and one gets locality domains that undergird SC. Empirically A’-movement applies strictly cyclically because it must given the combination of assumptions (i) and (ii) above.

Now, given this and a few other assumptions and it is also possible to model island effects in a unified way. The extra assumptions are: (iii) some BNs have “escape hatches” through which a moving element can move from one cyclic domain to another (viz. CP but crucially not DP) (iv) escape hatches can accommodate varying numbers of commuters (i.e. the number of exits can vary; English thought to have just one, while multiple WH fronting languages have many). If we add a further assumption - (v) DP and CP (and vP) are universally BNs but Gs can also select TP and PP as BNs – the theory allows for some typological variation.[2] (i)-(v) constitutes the classical Subjacency theory. Btw, the reconstruction above is historically misleading in one important way. SC was seen to be a consequence of the way in which island effects were unified. It’s not that SC was modeled first and then assumptions added to get islands, rather the reverse; the primary aim was to unify island effects and a singular consequent of this effort was SC. Indeed, it can be argued (in fact I would so argue) that the most interesting empirical support for the classical theory was the discovery of SC movement.

One of the hot debates when I was a grad student was whether long distance movement dependencies were actually SC. Kayne and Pollock and Torrego provided (at the time surprising) evidence that it was, based on SC inversion operations in French and Spanish. Chung supplied Comp agreement evidence from Chamorro to the same effect. This, added to the unification of islands, made ST the jewel in the GB crown, both theoretically and empirically. Given my general rule of thumb that GB is largely empirically accurate, I take it as relatively uncontroversial that any empirically adequate theory of FL must explain why Gs are SC.

As noted in a previous post (here), ST developed and expanded. But let’s leave history behind and jump to the present. Phase Theory (PT) is the latest model for SC. How does it compare with ST? From where I sit, PT looks almost isomorphic to it, or at least a version that extends to cover island effects does. A PT of this ilk has CP, vP and DP as phases.[3] It incorporates the Phase Impenetrabiltiy Condition (PIC) that requires that interacting expressions be in (at most) adjacent phases.[4] Distance is measured from one phase edge to the next (i.e. complements to phase heads are grammatically opaque, edges are not). This differs from ST in that the cyclic boundary is the phase/BN head rather than the MaxP of the Phase/BN head, but this is a small difference technically. PT also assumes “escape hatches” in the sense that movement to a phase edge moves an expression from inside one phase into the next higher phase domain and, as in ST, different phases have different available edges suitable for “escape.” If we assume that Cs have different numbers of available phase edges and we assume that D has no such available edges at all then we get a theory effectively identical to the ST. In effect, we traded phase edges for escape hatches and the PIC for (i).[5]

There are a few novelties in PT, but so far as I can tell they are innovations compatible with ST. The two most distinctive innovations regard the nature of derivations and multiple spell out (MSO). Let me briefly discuss each, in reverse order.

MSO is a revival of ideas that go back to Ross, but with a twist. Uriagereka was the first to suggest that derivations progressively make opaque parts of the derivation by spelling them out (viz. spell out (SO) entails grammatical inaccessibility, at least to movement operations). This is not new. ST had the same effect, as SC progressively makes earlier parts of the derivation inaccessible to later parts. PT, however, makes earlier parts of the derivation inaccessible by disappearing the relevant structure. It’s gone, sent to the interfaces and hence no longer part of the computation. This can be effected in various ways, but the standard interpretations of MSO (due to Chomsky and quite a bit different form Uriagereka’s) have coupled SO with linearization conditions in some way (Uriagereka does this as do Fox and Pesetsky, in a different way). This has the empirical benefit of allowing deletion to obviate islands. How? Deletion removes the burden of PF linearization and if what makes an island an island are the burdens of linearization (Uriagereka) or frozen linearizations (Fox and Pesetsky) then as deletion obviates the necessity of linearization, island effects should disappear, as they appear to do (Ross was the first to note this (surprise, surprise) and Merchant and Lasnik have elaborated his basic insight for the last decade!). At any rate, interesting though this is (and it is very interesting IMO), it is not incompatible with ST. Why? Because, ST never said what made an island an island, or more accurately, what made earlier cyclic material unavailable to later parts of the computation. (i.e. it had not real theory of inaccessibility, just a picture) and it is compatible with ST that it is PF concerns that render earlier structure opaque. So, though PT incorporates MSO, it is something that could have been added to ST and so is not an intrinsic feature of PT accounts. In other words, MSO does not follow from other parts of PT any more than it does from ST. It is an add-on; a very interesting one, but an add-on nonetheless.[6]

Note, btw, that MSO accounts, just like STs require a specification of when SO occurs. It occurs cyclically (i.e. either at the end of a relevant phase, or when the next phase head is accessed) and this is how PT models SC.

The second innovation is that phases are taken to be the units of computation. In Derivation by Phase, for example, operations are complex and non-markovian within the phase. This is what I take Chomsky to mean when he says that operations in a phase apply “all at once.” Many apply simultaneously (hence not one “line” at a time) and they have no order of application. I confess to not fully understanding what this means. It appears to require a “generate and filter” view of derivations (e.g. intervention effects are filters rather than conditions on rule application). It is also the case that SO is a complex checking operation where features are inspected and vetted before being sent for interpretation. At any rate, the phase is a very busy place: multiple operations apply all at once; expressions E and I merged, features checked and shipped.

This is a novel conception of the derivation, but again, is not inherent in the punctate nature of PT.[7] Thus, PT has various independent parts, one of which is isomorphic to traditional ST and other parts that are logically independent of one another and the ST similar part. That which explains SC is the same as what we find in ST and is independent of the other moving parts. Moreover, the parts of PT isomorphic to ST seem no better motivated (and no less worse) than the analogous features in ST: e.g. why the BNs are just these has no worse answer within ST than the question why the phase heads are just those.

That’s how I see PT. I have probably skipped some key features. But here are some crowd directed questions: What are the parade cases empirically grounding PT? In other words, what’s the PT analogue of affix hopping? What beautiful results/insights would we loose if we just gave PT up? Without ST we loose an account of island effects and SC. Without PT we loose…? Moreover, are these advantages intrinsic to minimalism or could they have already been achieved in more or less the same form within GB. In other words, is PT an empirical/theoretical advance or just a rebranding of earlier GB technology/concepts (not that there is anything intrinsically wrong with this, btw)? So, fellow minimalists, enlighten me. Show me the inner logic, the “virtual conceptual necessity” of the PT system as well as its empirical virtues. Show me in what ways we have advanced beyond our earlier GB bumblings and stumblings. Inquiring minimalist minds (or at least one) want to know.

[1] This “history” compacts about a decade of research and is somewhat anachronistic. The actual history is quite a bit more complicated (thanks Howard).

[2] Actually, if one adds vP as a BN then Rizzi like differences between Italian and English cannot be accommodated. Why? Because, once one moves into an escape hatch movement is thereafter escape hatch to escape hatch, as Rizzi noted for Italian. The option of moving via CP is only available for the first move. Thereafter, if CP is a BN movement must be CP to CP. If vP is added as a BN then it is the first available BN and whether one moves through it or not, all CP positions must be occupied. If this is too much “inside baseball” for you, don’t sweat it. Just the nostalgic reminiscences of a senior citizen.

[3] vP is an addition from Barriers versions of ST, though how it is incorporated into PT is a bit different from how vP acted in ST accounts.

[4] There are two versions of the PIC, one that restricts grammatical commerce to expressions in the same phase and a looser one that allows expressions in adjacent phases to interact. The latter is what is currently assumed (for pretty meager empirical reasons IMO – Nominative object agreement in quirky subject transitive sentences in Icelandic, I think).

[5] As is well known, Chomsky has been reluctant to extend phase status to D. However, if this is not done then PT cannot account for island effects at all and this removes one of the more interesting effects of cyclicity. There has been some allusions to the possibility that islands are not cyclicity effects, indeed not even grammatical effects. However, I personally find the latter suggestion most implausible (see the forthcoming collection on this edited by Jon Sprouse and yours truly: out sometime in the fall). As for the former, well, if islands are grammatical effects (and like I said, the evidence seems to me overwhelming) then if PT does not extend to cover these then it is less empirically viable than ST. This does not mean that it is wrong to divorce the two, but it does burden the revisionist with a pretty big theoretical note payable.

[6] MSO is effectively a theory of the PIC. Curiously, from what I gather, current versions of PT have began mitigating the view that SO removes structure by sending it to the interfaces. The problem is that such early shipping makes linearization problematic. It is also necessitates processes by which spelled out material is “reassembled” so that the interfaces can work their interpretive magic (think binding which is across interfaces, or clausal intonation, which is also defined over the entire sentence, not just a phase).

[7] Nor is the assumption that lexical access is SC (i.e. the numeration is accessed in phase sized chunks). This is roughly motivated on (IMO view weak) conceptual reasons concerning SC arrays reducing computational complexity and empirical facts about Merge over Move (btw: does anyone except me still think that Merge over Move regulates derivations?).

Faculty of Language

Comments