Monday, August 21, 2017

Language vs linguistics, again; the case of Christiansen and Chater

Morten Christiansen and Nick Chater have done us all a favor. They have written a manifesto (here, C&C) outlining what they take to be a fruitful way of studying language. To the degree that I understand it, it seems plausible enough given its apparent interests. It focuses on the fact that language as we encounter it on a daily basis is a massive interaction effect and the manifesto heroically affirms the truism (just love those papers that bravely go out on a limb (the academic version of exercise?)) that explaining interaction effects requires an “integrated approach.” Let me emphasize how truistic this is: if X is the result of many interacting parts then only a story that enumerates these parts, describes their properties and explains how they interact can explain an effect that is the result of interacting parts interacting. Thus a non-integrated account of an interaction effect is a logical non-starter. It is also worth pointing out the obvious: this is not a discovery, it is a tautology (and a rather superficial one at that), and not one that anyone (and here I include C&C’s bête noir Monsieur Chomsky (I am just back from vacationing in Quebec so excuse the francotropism)) can, should or would deny (in fact, we note below that he made just this observation oh so many (over 60) years ago!).

That said, C&C, from where I sit, does make two interesting moves that go beyond the truistic. The first is that it takes the central truism to be revolutionary and in need of defence (as if anyone in their right mind would ever deny it). The second noteworthy feature is that the transparent truth of the truism (note truisms need not be self-evident (think theorems) but this one is) seems to license a kind of faith based holism, one that goes some distance in thwarting the possibility of a non-trivial integrated approach of the kind C&C urges. 

Before elaborating these points in more detail, I need (the need here is pathetically psychological, sort of like a mental tic, so excuse me) to make one more overarching point: C&C appears to have no idea what GG is, what its aims are or what it has accomplished over the last 60 years. In other words, when C&C talks about GG (especially (but not uniquely) about the Chomsky program) it is dumb, dumb, dumb! And it is not even originally dumb. It is dumb in the old familiar way. It is boringly dumb. It is stale dumb. Dumb at one remove from other mind numbing dumbness. Boringly, predictably, pathetically dumb. It makes one wonder whether or not the authors ever read any GG material. I hope not. For having failed to read what it criticizes would be the only half decent excuse for the mountains of dumb s**t that C&C asserts. If I were one of the authors, I would opt for intellectual irresponsibility (bankruptcy (?)) over immeasurable cluelessness if hauled before a court of scientific inquiry. At any rate, not having ever read the material better explains the wayward claims confidently asserted than having read it and so misconstrued it. As I have addressed C&C’s second hand criticisms more than (ahem) once, I will allow the curious to scan the archives for relevant critical evisceration.[1]

Ok, the two main claims: It is a truism that language encountered “in the wild” is the product of many interacting parts. This observation was first made in the modern period in Syntactic Structures. In other words, a long time ago.[2] In other words, the observation is old, venerable and, by now, commonplace. In fact, the distinction between ‘grammatical’ and ‘acceptable’ first made over 60 years ago relies on the fact that a speaker’s phenomenology wrt utterances is not exclusively a function of an uttered sentence’s grammatical (G) status. Other things matter, a lot. In the early days of GG, factors such as memory load, attention, pragmatic suitability, semantic sensibility (among other factors) were highlighted in addition to grammaticality. So, it was understood early on that many many factors went into an acceptability judgment, with grammaticality being just one relevant feature. Indeed, this observation is what lies behind the competence/performance distinction (a point that C&C seems not to appreciate (see p. 3), the distinction aiming to isolate the grammatical factors behind acceptability, thereby, among other things, leaving room for other factors to play a role.[3]

And this was not just hand waving or theory protecting talk (again contra C&C, boy is its discussion DUMB!!). A good deal of work was conducted early on trying to understand how grammatical structure could interact with these other factors to burden memory load and increase perceived unacceptability (just think of the non-trivial distinction between center and self embedding and its implications for memory architecture).[4] This kind of work proceeds apace even today, with grammaticality understood to be one of the many factors that go into making judgments gradiently acceptable.[5] Indeed, there is no grammatically informed psycho-linguistic work done today (or before) that doesn’t understand that G/UG capacities are but one factor among others needed to explain real time acquisition, parsing, production, etc. UG is one factor in accounting for G acquisition (as Jeff Lidz, Charles Yang, Lila Gleitman etc. have endlessly emphasized) and language particular Gs are just one factor in explaining parsability (which is, in turn, one factor in underlying acceptability) (as Colin Phillips, Rick Lewis, Shravan Vasishth, Janet Fodor, Bob Berwick, Lyn Frazier, Jon Sprouse, etc. etc. etc. have endlessly noted). Nobody denies the C&C truism that language use involves multiple interacting variables. Nobody is that stupid!

So, C&C is correct in noting that if one’s interest is in figuring out how language is deployed/acquired/produced/parsed/etc. then much more than a competence theory will be required. This is not news. This is not even an insight. The question is not if this is so, but how it is so. Given this, the relevant question is: what tree is C&C barking up by suggesting that this is contentious?

I have two hypotheses. Here they are.

1.     C&C doesn’t take G features to be at all relevant to acceptability.
2.     C&C favors a holistic rather than an analytic approach to explaining interaction effects in language.

Let’s discuss each in turn.

C&C is skeptical that grammaticality is a real feature of natural language expressions. In other words, C&C's beef with the traditional GG conception in which G/UG properties are one factor among many lies with assigning G/UG any role at all. This is not as original as it might sound. In fact, it is quite a traditional view, one that Associationists and Structuralists held about 70 years ago. It is the view that GG defenestrated, but apparently, did not manage to kill (next time from a higher floor please). The view amounts to the idea that G regularities (C&C is very skeptical that UG properties exist at all, I return to this presently) are just probabilistic generalizations over available linguistic inputs. This is the view embodied in Structuralist discovery procedures (and suggested in current Deep Learning approaches) wherein levels were simple generalizations over induced structures of a previous lower level. Thus, all there is to grammar is successively more abstract categories built up inductively from lower level less abstract categories. On this view, grammatical categories are classes of words, which are definable as classes of morphemes, which are definable as classes of phonemes, which are definable as classes of phones. The higher levels are, in effect, simple inductive generalizations over lower level entities. The basic thought is that higher-level categories are entirely reducible to lower level distributional patterns. Importantly, in this sort of analysis, there are no (and can be no) interesting theoretical entities, in the sense of real abstract constructs that have empirical consequences but are not reducible or definable in purely observational terms. In other words, on this view, syntax is an illusion and the idea that it makes an autonomous contribution to acceptability is a conceptual error.

Now, I am not sure whether C&C actually endorses this view, but it does make noises in that direction. For example, it endorses a particular conception of constructions and puts it “at the heart” of its “alternative framework” (4). The virtues of C&C constructions is that they are built up from smaller parts in a probabilistically guided manner. Here is C&C (4):

At the heart of this emerging alternative framework are constructions , which are  learned pairings of form and meaning ranging from meaningful parts of words (such as word endings, for example, ‘-s’, ‘-ing’) and words themselves (for example, ‘penguin’) to multiword sequences (for example, ‘cup of tea’) to lexical patterns and schemas (such as, ‘the X-er, the Y-er’, for example, ‘the bigger, the better’). The quasi-regular nature of such construction grammars allows them to capture both the rule-like patterns as well as the myriad of exceptions that often are excluded by fiat from the old view built on abstract rules. From this point of view, learning a language is learning the skill of using constructions to understand and produce language. So, whereas the traditional perspective viewed the child as a mini-linguist with the daunting task of deducing a formal grammar from limited input, the construction-based framework sees the child as a developing language-user, gradually honing her language-processing skills. This requires no putative universal grammar but, instead, sensitivity to multiple sources of probabilistic information available in the linguistic input: from the sound of words to their co-occurrence patterns to information from semantic and pragmatic contexts.

This quote does not preclude a distinctive Gish contribution to acceptability, but its dismissal of any UG contribution to the process suggests that it is endorsing a very strong rejection of the autonomy of syntax thesis.[6] Let me repeat, a commitment to the centrality of constructions does not require this. However, the C&C version seems to endorse it. If this is correct, then C&C sees the central problem with modern GG is its commitment to the idea that syntactic structure is not reducible to either statistical distributional properties or semantic or pragmatic or phonological or phonetic properties of utterances. In other words, C&C rejects the GG idea that grammatical structure is real and makes any contribution to the observables we track through acceptability.

This view is indeed radical, and virtually certain to be incorrect.[7] If there is one thing that all linguists agree on (including constructionists like Jackendoff and Culicover) it’s that syntax is real. It is not reducible to other factors. And if this is so, then G structure exists independently of other factors. I also think that virtually all linguists believe that syntax is not the sum of statistical regularity in the PLD.[8] And there is good reason for this; it is morally certain that many of the grammatical factors that linguists have identified over the last 60 years have linguistically proprietary roots and leave few footmarks in the PLD. To argue that this standard picture is false requires a lot of work, none of which C&C does or points to. Of course, C&C cannot be held responsible for this failing, for C&C has no idea what this work argues because C&C’s authors appear never to have never read any of it (or, if it has been read, it has not been understood, see above). But were C&C informed by any of this work, it would immediately appreciate that it is nuts to think that it is possible to eliminate G features as one factor in acceptability.[9]

In sum, one possible reading of C&C is that it endorses the old Structuralist idea of discovery procedures, denies the autonomy of syntax thesis (i.e. the thesis that syntax is “real”) and believes in the (yes I got to say it) the old Empiricist/Associationist trope that language capacity is nothing but a reflection of tracked statistical regularities. It’s back folks. No idea ever really dies, no matter how unfounded and implausible and how many times it has been stabbed through the heart with sharp arguments.

Before going on to the second point, let me add a small digression concerning constructions. Look, anyone who works on the G of a particular language endorses some form of constructionism (see here for some discussion). Everyone assumes that morphemes have specific requirements, with specific selection restrictions. These are largely diacritical and part of the lexical entry of the morpheme. Gs are often conceived as checking these features in the course of a derivation and one of the aims of a theory of Gs (UG) is to specify the structural/derivational conditions that regulate this feature checking. Thus, everyone’s favorite language specific G has some kinds of constructions that encode information that is not reducible to FL or UG principles (or not so reducible as far as we can tell). 

Moreover, it is entirely consistent with this view that units larger than morphemes code this kind of information. The diacritics can be syncategorematic and might grace structures that are pretty large (though given something like an X’ syntax with heads or a G with feature percolation the locus of the diacritical information can often be localized on a “listable” linguistic object on the lexicon). So, the idea that C&C grabs with both hands and takes to be new and revolutionary is actually old hat. What distinguishes the kind of constructionism one finds in C&C from the more standard variety found in standard work is the idea central to GG that constructions are not “arbitrary.” Rather, constructions have a substructure regulated by more abstract principles of grammar (and UG). C&C seems to think that anything can be a construction. But we know that this is false.[10] Constructions obey standard principles of Grammar (e.g. no mirror image constructions, no constructions that violate the ECP or binding theory, etc.). So though there can be many kinds of constructions that compile all sorts of diverse information there are some pretty hard constraints regulating what a possible construction is.

Why do I mention this? Because I could not stop myself! Constructions lie at the heart of C&C’s “alternative framework” and nonetheless C&C has no idea what they are, that they are standard fare in much of standard GG (even minimalist Gs are knee deep in such diacritical features) and that they are not the arbitrary pairings that C&C takes them to be. In other words, once again C&C is mind numbingly ignorant (or, misinformed).

So that’s one possibility. C&C denies G properties are real. There is a second possible assumption, one that does not preclude this one and is often found in tandem with it, but is nonetheless different. The second problem C&C sees with the standard view lies with its analytical bent. Let me explain.

The standard view of performance within linguistics is that it involves contributions of many factors. Coupled with this is a methodology: The right way to study these is to identify the factors involved, figure out their particular features and see how they combine in complex cases. One of the problems with studying such phenomena is that the interacting factors don’t always nicely add up. In other words, we cannot just add the contributions of each component together to get a nice well-behaved sum at the end. That’s what makes some problems so hard to solve analytically (think turbulence). But, that’s still the standard way to go about matters.  GG endorsed this view from the get-go. To understand how language works in the wild, figure out what factors go into making, say, an utterance, and see how these factors interact. Linguists focused on one factor (G and UG) but understood that other factors also played a role (e.g. memory, attention, semantic/pragmatic suitability etc.). The idea was that in analyzing (and understanding) any bit of linguistic performance, grammar would be one part of the total equation, with its own distinctive contribution.[11]

Two things are noteworthy about this. First, it is hard, very hard. It requires understanding how (at least) two “components” function as well as understanding how they interact.  As interactions need not be additive, this can be a real pain, even under ideal conditions where we really know a lot (that’s why engineers need to do more than simply apply the known physics/chemistry/biology). Moreover, interaction effects can be idiosyncratic and localized, working differently in different circumstances (just ask your favorite psycho/neuro linguist about task effects). So, this kind of work is both very demanding and subject to quirky destabilizing effects. Recall Fodor’s observation: the more modular a problem is, the more likely it is solvable at all. This reflects the problems that interaction effects generate.[12]

At any rate, this is the standard way science proceeds when approaching complex phenomena. It factors it into its parts and then puts these parts back together. It is often called atomism or reductionsism but it is really just analysis with synthesis and it has proven to be the only real game in town.[13] That said, many bridle at this approach and yearn for more holistic methods. Connectionsists used to sing the praises of holism: only the whole system computes! You cannot factor a problem into its parts without destroying it. Holists often urge simulation in place of analysis (let’s see how the whole thing runs). People like me find this to be little more than the promotion of obscurantism (and not only me, see here for a nice take down in the domain of face perception).

Why do I mention this here? Because, there is a sense in which C&C seems to object not only to the idea that Grammar is real, but also to the idea that the right way to approach these interaction effects is analytically. C&C doesn’t actually say this, but it suggests it in its claims that the secret to understanding language in the wild lies with how all kinds of information are integrated quickly in the here and now. The system as a whole gives rise to structure (which “may [note the weasel word here, btw, NH] explain why language structure and processing is highly local in the linguistic signal” (5))[14] and the interaction of the various factors eases the interpretation problem (though as Gleitman and Trusewell and friends have shown, having too much information is itself a real problem (see here, here and here.)). The prose in C&C suggests to me that only at the grain of the blooming buzzing interactive whole will linguistic structure emerge. If this is right, then the problem with the standard view is not merely that it endorses the reality of grammar, but that it takes the right approach to be analytic rather than holistic. Again, C&C does not expressly say this, but it does suggest it, and it makes sense of its dismissal of “fragmented” investigations of the complex phenomenon. In their view, we need to solve all the problems at once and together, rather than piecemeal and then fit them together. Of course, we all know that there is no “best” way to proceed in these complex matters; that sometimes a more focused view is better and sometimes a more expansive one. But the idea that an analytic approach is “doomed to fail” (1) surely bespeaks an antipathy towards the analytic approach to language.

An additional point: note that if one thinks that all there is to language is statistically piecing together of diverse kinds of information then one is really against the idea that language in the wild is the result of interacting distinct modules with their own powers and properties. This, again, is an old idea. Again you all know who believed this (hint, starts with an E). So, if one were looking for an overarching unifying theme in C&C, one that is not trotted out explicitly but holds the paper together, then one could do worse than look to Associationism/Empiricism. This is the glue that holds the various parts together, from the hostility to the very idea that grammars are real to the conviction that the analytic approach (standard in the sciences) is doomed to failure.

There is a lot of other stuff in this paper that is also not very good (or convincing). But, I leave it as an exercise to the reader to find these and dispose of them (take a look at the discussion of language and cultural evolution for a real good time (on p.6). I am not sure, but it struck me as verging on the incoherent and mixing up the problem of language change with the problem of the emergence of a facility for language). Suffice it to say that C&C adds another layer to the pile of junk written on language perpetrated on the innocent public by the prestige journals. Let me end with a small rant on this.

C&C appeared in Nature. This is reputed to be a fancy journal with standards (don’t believe it for a moment. It’s all show business now).[15] I doubt that Nature believes that it publishes junk. Maybe it takes it to be impossible to evaluate opinion or “comment” pieces. Maybe it thinks that taste cannot be adjudicated. Maybe. But I doubt it. Rather, what we are witnessing here is another case of Chomsky bashing, with GG as collateral damage. It is not only this, but it is some of this. The other factor is the rise of big data science. I will have something to say about this in a later post. For now, take a look at C&C. It’s the latest junk installment of a story that doesn’t get better with repetition. But in this case all the arguments are stale as well as being dumb. Maybe their shelf expiration date will come soon. One can only hope even if such hope is irrational given the evidence.

[1] Type ‘Evans and Levinson’ (cited in C&C) or ‘Vyvyan Evans’ or ‘Everett’ in the search section for a bevy of replies to the old tired incorrect claims that C&C throws out like confetti at a victory parade.
[2] Actually, I assume that Chomsky’s observations are just another footnote to Plato or Aristotle, though I don’t know what text he might have been footnoting but, as you know the guy LOVES footnotes!
[3] The great sin of Generative Semantics was to conflate grammaticality and acceptability by, in effect, treating any hint of unacceptability as something demanding a grammatical remedy.
[4] I should add that the distinction between these two kinds of structures (center vs self embedding) is still often ignored or run together. At time, it makes one despair about whether there is any progress at all in the mental sciences.
[5] And which, among other things, led Chomsky to deny that there is a clean grammatical/ungrammatical distinction, insisting that there are degrees of grammaticality as well as the observed degrees of acceptability. Jon Sprouse is the contemporary go-to person on these issues.
[6] And recall, the autonomy of syntax thesis is a very weak claim. It states that syntactic structure is real and hence not reducible to observable features of the linguistic expression. So syntax is not just a reflex of meaning or sound or probabilistic distribution or pragmatic felicity or… Denying this weak claim is thus a very strong position.
[7] There is an excellent discussion of the autonomy of syntax and what it means and why it is important in the forthcoming anniversary volume on Syntactic Structures edited by Lasnik, Patel-Grosz, Yang et moi. It will make a great stocking stuffer for the holidays so shop early and often.
[8] Certainly Jackendoff, the daddy of constructionism has written as much.
[9] Here is a good place to repeat sotto voce and reverentially: ‘Colorless green ideas sleep furiously’ and contrast it with ‘Furiously sleep ideas green colorless.’
[10] Indeed, if I am right about the Associationist/Empiricist subtext in C&C then C&C does not actually believe that there are inherent limits on possible constructions. On this reading of C&C the absence of mirror image constructions is actually just a fact about their absence in the relevant linguistic environment. They are fine potential constructions. They just happen not to occur.  One gets a feeling that this is indeed what C&C thinks by noting how impressed it is with “the awe-inspiring diversity of the world’s languages” (6). Clearly C&C favors theories that aim for flexibility to cover this diversity. Linguists, in contrast, often focus on “negative facts,” possible data that is regularly absent. These have proven to be reliable indicators of underlying universal principles/operations. The fact that C&C does not mention this kind of datum is, IMO, a pretty good indicator that it doesn’t take it seriously. Gaps in the data are accidents, a position that any Associationist/Empiricist would naturally gravitate towards. In fact, if you want a reliable indicator of A/E tendencies look for a discussion of negative data. If it does not occur, I would give better than even odds that you are reading the prose of a card carrying A/Eer.
[11] Linguists do differ on whether this is a viable project in general (i.e. likely to be successful). But this is a matter of taste, not argument. There is no way to know without trying.
[12] For example, take a look at this recent piece on the decline of the bee population and the factors behind it. It ends with a nice discussion of the (often) inscrutable complexity of interaction effects:

Let's add deer to the list of culprits, then. And kudzu. It's getting to be a long list. It's also an indication of what a complex system these bees are part of. Make one change that you don't think has anything to do with them -- develop a new pesticide, enact a biofuels subsidy, invent the motorized lawnmower -- and the bees turn out to feel it.

[13] Actually, it used to be the only game in town. There are some urging that scientific inquiry give up the aim of understanding. I will be writing a post on this anon.
[14] This btw, is not even descriptively correct given the myriad different kinds of locality that linguists have identified. Indeed, so far as I know, there is no linear bound between interacting morphemes anywhere in syntax (e.g. agreement, binding, antecedence, etc.).
[15] It’s part of the ethos of the age. See here for the theme song.

Monday, August 14, 2017

Grammars and functional explanations

One of the benefits of having good colleagues and a communal department printer is that you get to discover interesting stuff you would never have run across. The process (I call itresearch,” btw) is easy: go get what you have just printed out for yourself and look at the papers that your colleagues have printed out for themselves that are lying in the pick-up tray. If you are unscrupulous you steal it from the printer tray and let your colleague print another for her/himself. If you are honest you make a copy of the paper and leave the original for your collegial benefactor (sort of a copy theory of mental movement).  In either case, whatever the moral calculus, there is a potential intellectual adventure waiting for you every time you go and get something you printed out. All of this is by way of introducing the current post topic. A couple of weeks ago I fortuitously ran into a paper that provoked some thought, and that contained one really pregnant phrase (viz. “encoding-induced load reduction”). Now, we can all agree that the phrase is not particularly poetic. But it points to a useful idea whose exploration was once quite common. I would like to say a few words about the paper and the general idea as a way of bringing both to your attention.

The paper is by Bonhage, Fiebach, Bahlmann and Mueller (BFBM). It makes two main points: (i) to describe coding for the structure features of language unfold over time and (ii) to identify the neural implementations of this process.  The phenomenal probe into this process is the Sentence Superiority Effect (SSE). SSE is “that observation that sentences are remembered better than ungrammatical word strings” (1654). Anyone who has crammed for an exam where there is lots of memorization is directly acquainted with the SSE. It’s a well know device for making otherwise disparate information available to concoct sentences/phrases as mnemonic devices. This is the fancy version of that. At any rate, it exists, is behaviorally robust and is a straightforward bit of evidence for online assignment of grammatical structure where possible. More exactly, it is well known that “chunking” enhances memory performance and it seems, not surprisingly, that linguistic structure affords chunking. Here is BFBM (1656):

Linguistically based chunking can also be described as an enriched
encoding process because it entails, in addition to the simple sequence of items, semantic and syntactic relations between items… [W]e hypothesize that the WM benefit of sentence structure is to a large part because of enriched encoding. This enriched encoding in turn is hypothesized to result in reduced WM [working memory, NH] demands during the subsequent maintenance phase…

BFBM identifies the benefit specifically to maintaining a structure in memory, though there is a cost for the encoding. This predicts increased activity in those parts of the brain wherein coding happens and reduced activity in parts of the brain responsible for maintaining the coded information. As BFBM puts it (1656):

With respect to functional neuroanatomy, we predict that enriched encoding should go along with increased activityin the fronto-temporal language network for semantic and syntactic sentence processing.

During the subsequent maintenance period, we expected to see reduced activity for sentence material in VWM systems responsible for phonological rehearsal because of the encoding-induced load reduction.

The paper makes several other interesting points concerning (i) the role of talking to oneself sotto voce in memory enhancement (answer: not important factor in SSE), (ii) the degree to which the memory structures involved in the SSE are language specific or more domain general (answer: both language areas and more general brain areas involved) and (iii) the relative contribution of syntactic vs semantic structure to the process (somewhat inconclusive IMO). At any rate, I enjoyed going through the details and I think you might as well.

But what I really liked is the program of linking linguistic architectures with more general cognitive processes. Here, again, is BFBM (1666):

But how does the involvement of the semantic system contribute to a performance advantage in the present working memory task? One possible account is chunking
of information. From cognitive psychology, we know that chunking requires the encoding of at least two hierarchical levels: item level and chunk level (Feigenson & Halberda, 2004). The grammatical information contained in the word list makes it possible to integrate the words (i.e., items) into a larger unit (i.e., chunk) that is specified by grammatical relationships and a basic meaning representation,
as outlined above. This constitutes not only a syntactically but also a semantically enriched unit that contains agents and patients characterized, for example,
by specific semantic roles. Additional encoding of sentence-level meaning of this kind, triggered by syntactic structure, might facilitate the following stages (i.e.,
maintenance and retrieval) of the working memory process.

So, there is a (not surprising) functional interaction between grammatical coding and enhanced memory (through load reduction) through reduced maintenance costs in virtue of there existing an encoding of linguistic information above the morpheme/word level. Thus, the existence of G encodings fits well with the cognitive predilections of memory structure (in this case maintenance).

Like I said, this general idea is very nice and is one that some of my closest friends (and relatives) used to investigate extensively. So, for example, Berwick and Weinberg (B&W) tried to understand the Subjacency Condition in terms of its functional virtues wrt efficient left corner parsing (see, e.g. here). Insightful explorations of the “fit” between G structure and other aspects of cognition are rarish if for no other reason that it requires really knowing something about the “interfaces.” Thus, you need to know something about parsers and Gs to do what B&W attempted. Ditto with current work on the meaning of quantifiers and the resources of the analogue number system embedded on our perceptual systems (see here). Discovering functional fit requires really understanding properties of the interfaces in a non-trivial way. And this is hard!

That said, it is worth doing, especially if one’s interests lie in advancing minimalist aims. We expect to find these kinds of dependencies, and we expect that linguistic encodings should fit snugly with non-linguistic cognitive architecture if “well-designed.”[1] Moreover, it should help us to understand some of the conditions that we find regulate G interactions. So, for example, Carl De Marcken exhibited the virtues of headedness for unsupervised learners (see here for discussion and links). And, it seems quite reasonable to think that Minimality is intimately connected with the fact that biological memory is subject to similarity based interference effects. It is not a stretch, IMO, to see minimality requirements as allowing for “encoding-induced load reduction” by obviating (some of) the baleful effects of similarity based interference endemic to a content addressable memory system like ours. Or, to put this another way, one virtue of Gs that include a minimality restriction is that it will lessen the cognitive memory load on performance systems that use these Gs (most likely for acts of retrieval (vs maintenance)).

Minimalism invites these kinds of non-reductive functional investigations. It invites asking: how does the code matter when used? Good question, even if non-trivial answers are hard to come by.

[1] Yes, I know, there are various conceptions of “good design” only some of which bear on this concept. But one interesting thing to investigate is the concept mooted here for it should allow us to get a clearer picture of the structure of linguistic performance systems if we see them as fitting well with the G system. This assumption allows us to exploit G structure as a probe into the larger competence plus production system. This is what BFBM effectively does to great effect.

Thursday, July 27, 2017


I am off for two weeks and will not post anything in that time. I will be sitting by a lake, drinking and eating to excess with friends and family. Hope you can do something similar.

The logic of GG inquiry

In the last post I was quite critical of a piece that I thought mischaracterized the nature of linguistic inquiry of the Chomsky GG variety. I thought that I should do more than hector from the sidelines (though when Hector left the sidelines things did not end well for him). Here is an attempt to outline what is not (or should not be) controversial. It tries to outline the logic of GG investigations, the questions that orient it, and the rational history that follows from pursing these questions systematically. This is not yet a piece for the uninitiated, but fleshed out, I think it could serve as a reasonably good into into what one stripe of linguists do and why.  There is need for more filling (illustrations of how linguists get beyond and build on the obvious). But, this is a place to start, IMO, and if one starts here lots of misconceptions will be avoided.

Linguistics (please note the ‘i’ here) revolves around three questions:

(1)  What’s a possible linguistic structure in L?
(2)  What’s a possible G (for a given PLD)?
(3)  What’s a possible FL (for humans)?

These three questions correspond to three facts:

            (1’)      The fact of linguistic creativity (a native speaker can and does regularly
produce and understand linguistic objects never before encountered by her/him)

(2’)      The fact of linguistic promiscuity (any kid can acquire any language in (roughly) the same way as any other kid/language)

(3’)      The fact of linguistic idiosyncrasy (humans alone have the linguistic capacities they evidently have (i.e. both (1’) and (2’) are species specific facts)

Three big facts, three big questions concerning those facts. And three conclusions:

            (1’’)     Part of what makes native speakers proficient in a language is their
                        cognitive internalization of a recursive G

(2’’)     Part of human biology specifies a species wide capacity (UG) to acquire recursive Gs (on the basis of PLD)

            (3’’)     Humans alone have evolved the Gish capacities and meta-capacities
specified in (2’’) and (3’’) in the sense that our ancestors did not have this meta-capacity (nor do other animals) and we do

IMO, the correctness of these conclusions is morally certain (certain in the sense that though not logically required, are trivially obvious and indubitable once the facts in (1’-3’) are acknowledged. Or, to put this another way, the only way to deny the trivial truths in (1’’-3’’) is to deny the trivial facts in (1’-3’). Note, that this does not mean that these are the only questions one can ask about language, but if the questions in (1-3) are of interest to you (and nobody can force anybody to be interested in any question!), then the consequences that follow from them are sound foundations for further inquiry. When Chomsky claims that many of the controversial positions he has advanced are not really controversial, this is what he means. He means that whatever intellectual contentiousness exists regarding the claims above in no way detracts from their truistic nature. Trivial and true! Hence, intellectually uncontroversial. He is completely right about this.

So, humans have a species specific dedicated capacity to acquire recursive Gs. Is this all that we can trivially deduce from obvious facts? Nope. We can also observe that these recursive Gs have a side that we can informally call a meaning (M), and a side that we can informally can a sound (S) (or, more precisely, an articulation). So, the recursive G pairs meanings with sounds (in Chomsky’s current formulation of the old Aristotelian observation (and yes, it is very old because very trivial). And this unbounded pairing of Ms and Ss is biologically novel in humans. Does this mean that anything we can call language rests on properties unique to humans? Nope. All that follows (but it does follow trivially) is that this unbounded capacity to pair Ms and Ss is biologically species specific. So, even if being able to entertain thoughts is not biologically specific and the capacity to produce sounds (indeed many many) is not biologically unique, the capacity to pair Ms with Ss open-endedly IS. And part of the project of linguistics is to explain (i) the fine structure of the Gs we have that subvene this open-ended pairing, (ii) the UG (i.e. meta-capacity) we have that allows for the emergence of such Gs in humans and (iii) a specification of how whatever is distinctively linguistic about this meta-capacity fits in with all the other non linguistically proprietary and exclusively human cognitive and computational capacities we have to form the complex capacity we group under the encyclopedia entry ‘language.’

The first two parts of the linguistic project have been well explored over the last 60 years. We know something about the kinds of recursive procedures that particular Gs deploy and something about the possible kinds of operations/rules that natural language Gs allow. In other words, we know quite a bit about Gs and UG. Because of this in the last 25 years or so it has become fruitful to ruminate about the third question: how it all came to pass, or, equivalently, why we have the FL we have and not some other? It is a historic achievement of the discipline of linguistics that this question is ripe for investigation. It is only possible because of the success in discovering some fundamental properties of Gs and UG. In other words, the Minimalist Program is a cause for joyous celebration (cue the fireworks here). And not only is the problem ripe, there is a game plan. Chomsky has provided a plausible route towards addressing this very hard problem.

Before outlining the logic (yet again) let’s stop and appreciate what makes the quetion hard. It’s hard because it requires distinguishing two different kinds of universals; those that are cognitively and computationally general from those that are linguistically proprietary, and to do this in a principled way. And that is hard. Very very hard. For it requires thinking of what we formally called UG is an interaction effect, and hence as not a unitary kind of thing. Let me explain.

The big idea behind minimalism is that much of the “mechanics” behind our linguistic facility is not linguistically parochial. Only a small part is. In practical terms, this means that much of what we identified as “linguistic universals” from about the mid 1960s to the mid 1990s are themselves composed of operations only some of which are linguistically proprietary. In other words, just as GB proposed to treat constructions as the interaction of various kinds of more general mechanisms rather than as unitary linguistic “rules” now minimalism is asking that we thing of universals as themselves composed of various kinds of interacting computational and cognitive more primitive operations only some of which are linguistically proprietary. 

In fact, the minimalist conceit is that FL is mostly comprised of computational operations that are not specific to language. Note the ‘most.’ However, this means that at least some part of FL is linguistically specific/special (remember 3/3’/3’’ above). The research problem is to separate the domain specific wheat from the domain general chaff. And that requires treating most of the “universals” heretofore discovered as complexes and showing how their properties could arise from the interaction of the general and specific operations that make them up. And that is hard both analytically and empirically.

Analytically it is hard because it requires identifying plausible candidates for the domain general and the linguistically proprietary operations. It is empirically difficult for it requires expanding how we evaluate our empirical results. An analogy with constructions and their “elimination” as grammatical primitives might make this clearer.

The appeal of constructions is that they correspond fairly directly to observable surface features of a language. Topicalizations have topics which sit on the left periphery. Topics have certain semantic properties. Topicalizations allow unbounded dependencies between the topic and a thematic position, though not if the gap is inside an island and the gap is null.  Topicalization is similar to, but different from Wh-questions, which are in some ways similar to focus constructions, and in some ways not and all are in some ways similar to relative clause constructions and in some ways not. These constructions have all been described numerous times identifying more and more empirical nuances. Given the tight connection between constructions and their surface indicators, they are very handy ways of descriptively carving up the data because they provide useful visible landmarks of interest. They earn their keep empirically and philologically. Why then dump them? Why eliminate them?

Mid 1980s theory did so because they inadequately answer a fundamental question: why do constructions that are so different in so many ways nonetheless behave the same way as regards, say, movement? Ross established that different constructions behaved similarly wrt island effects, so the question arose as to why this was so. One plausible answer is that despite their surface differences, various constructions are composed from similar building blocks. More concretely, all the identified constructions involve a ‘Move Alpha’ (MA) component and MA is subject to locality conditions of the kind that result in island effects if violated. So, why do they act the same? Because they all use a common component which is grammatically subject to the relevant locality condition.

Question asked. Question answered. But not without failing to cover all the empirical ground constructions did. Thus, what about all the differences? After all, nobody thinks that Topicalization and Relativization are the same thing! Nobody. All that is claimed is that they are formed exploiting a common sub-operation and that is why they all conform to island restrictions. How are the differences handled? Inelegantly. They are “reduced” to “criterial conditions” that a head imposes on its spec or feature requirements that a probe imposes on its goal. In other words, constructions are factored into the UG relevant part (subject to a specific kind of locality) and the G idiosyncratic part (feature/criteria requirements between heads and phrases of a certain sort). In other words, constructions are “eliminated” in the sense of being grammatically basic, not in being objects of the language with the complex properties they have.  Constructions, in other words, are the result of the complex interactions of more primitive Gish operations/features/principles. They are interaction effects, with all the complexity this entails.

But this factorization is not enough. One more thing is required to make deconstructing constructions into their more basic constituent parts all theoretically and empirically worthwhile. It is required that we identify some signature properties of the more abstract MA that is a fundamental part of the other constructions, and that’s what all the fuss about successive cyclicity was all about. It was interesting because it provided a signature property of the movement operation: what appears to be unbounded movement is actually composed of small steps, and we were able to track those steps. And that was/is a big deal. It vindicated the idea that we should analyze complex constructions as the interaction of more basic operations.

Let’s now return to the problem of distilling the domain general from the domain specific wrt FL. This will be hard for we must identify plausible operations of each type, show that in combination they yield comparable empirical coverage as earlier UG principles, and identify some signature properties of the domain specific operations/principles. All of this is hard to do, and hence the intellectual interest of the problem.

So what is Chomsky’s proposed route to this end? His proposal is to take recursive hierarchy as the single linguistically specific property of FL. All other features of FL are composite. The operation that embodies this property is, of course, Merge. The conceit is that the simplest (or at least one very simple) operation that embodies this property also has other signature properties we find universally in Gs (e.g. embodies both structure building and displacement, provides G format for interpretation and reconstruction effects, etc.[1]). So identify the right distinctive operation and you get as reward an account for why Gs display some signature properties.

Does this mean that FL only contains Merge? No. If true, it means that Merge is the only linguistically distinctive operation of this cognitive component. FL has other principles and operations as well. So feature checking is a part of FL (Gs do this all the time and is the locus of G differences), though it is unlikely that feature checking is an operation proprietary to FL (even though Gs do it and FL exploits it). Minimality is likely an FL property, but one hopes that it is just a special instance of a more general property that we find in other domains of cognition (e.g. similarity based interference).[2] So too with phases (one hopes), which function to bound the domain of computations, something that well designed systems will naturally do. Again, much of the above are promissory notes, not proposals, but hopefully you get the idea. Merge in combination with these more generic cognitive and computational operations work in concert to deliver an FL.

IMO (not widely shared I suspect), the program is doing quite well in providing a plausible story along these lines. Why do we have the FL we have? Because it is the simplest (or very simple) combination of generic computational and cognitive principles plus one very simple linguistically distinctive operation that yields a most distinctive feature of human linguistic objects, unbounded hierarchy.

Why is simple important? Because it is a crucial ingredient of the phenotypic gambit (see here). We are assuming that simple and evolvable are related. Or, more exactly, we are taking phenotypically simple as proxy for genetically simple as is typical in a lot of work on evolution.[3]

So linguistics starts from three questions rooted in three basic facts and resulting in three kinds of research; into G, into UG and into FL. These questions build on one another (which is what good research questions in healthy sciences do). The questions get progressively harder and more abstract. And, answers to later questions prompt revisions of earlier conclusions. I would like to end this over long disquisition with some scattered remarks about this.

As noted, these projects take in one another’s wash. In particular, the results of earlier lines of inquiry are fodder for later ones. But they also change the issues. MP refines the notion of a universal, distancing it even more than its GB ancestor does from Greenbergian considerations. GB universals are quite removed from the simple observations that motivate a Greenberg style universal recall: they are largely based on negative data). However, MP universals are even some distance from classical GB universals in that MP worries the distinction between those cognitive features that are linguistically proprietary and those that are not in a way that GB seldom (never?) did. Consequently, MP universals (e.g. Merge) are even more “abstract” than their GBish predecessors, which, of course, makes them more remote from the kind of language particular data that linguists are trained to torture for insights.

Or to put this another way: MP is necessarily less philologically focused than even GB was. The focus of inquiry is explicitly the fine structure of FL. This was also true of earlier GBish theories, but, as I’ve noted before, this focus could be obscured. The philologically inclined could have their own very good reasons for “going GB,” even absent mentalist pretentions. MP’s focus on the structure of FL makes it harder (IMO, impossible) to evade a mentalist focus.[4]

A particularly clear expression of the above is the MP view of parameters. In GBish accounts parameters are internal properties of FL that delimit the class of possible Gs. Indeed, Chomsky made a big deal of the fact that in P&P theories there were a finite number of Gs (though perhaps a large finite number) dependent on the finite number of choices for values FL allowed. This view of parameters fit well with the philologists interest in variation, for it proposed that variation was severely confined, limited to a finite number of possible differences.  On this view, the study of variation feeds into a study of FL/UG by way of a study of the structure of the finite parameter space. So, investigating different languages and how they vary is, on this view, the obvious way of studying the parametric properties of FL.

But, from an MP point of view, parameters are suspect. Recall, the conceit is that the less linguistic idiosyncrasy built into FL, the better. Parameters are very very idiosyncratic (is TP or CP a bounding node? Are null subjects allowed?). So the idea of FL internal parameters is MP unwelcome. Does this deny that there is variation? No. It denies that variation is parametrically constrained. Languages differ, there is just no finite limit to how they might.

Note that this does not imply that anything goes. It is possible that no Gs allow some feature without it being the case that there is a bound on what features a G will allow. So invariances (aka: principles) are fine. It’s parameters that are suspect. Note, that on this view, the value of work on variation needs rethinking. It may tell you little about the internal structure of FL (though it might tell you a lot about the limits of the invariances).[5]

Note further that this further drives a wedge between standard linguistic research (so much is dedicated to variation and typology) and the central focus of MP research, the structure of FL. In contrast to P&P theories where typology and variation are obviously relevant for the study of FL, this is less obvious (I would go further) in an MP setting. I tend to think that this fact influences how people understand the virtues and achievements of MP, but as I’ve made this point before, I will leave it be here.

Last, I think that the MP problematic encourages a healthy disdain for surface appearances, even more so than prior GBish work. Here’s what I mean: if your interest is in simplifying FL and relating the distinctive features of language to Merge then you will be happy downplaying surface morphological differences. So, for example, if MP leads you to consider a Merge based account of binding, then reflexive forms (e.g. ‘himself’) are just the morphological residues of I-merge. Do they have interesting syntactic properties? Quite possibly not. They are just surface detritus. Needless to say, this way of describing things can be seen, from another perspective, as anti-empirical (believe me, I know whereof I write). But if we really think that all that is G distinctive leads back to Merge then if you think that c-command is a distinctive product of Merge and you find this in binding then you will want to unify I-merge and binding theory so as to account for the fact that binding requires c-command. But this will then mean ignoring many differences between movement and binding, and one way to do this is to attribute the differences to idiosyncratic “morphology” (as we did in eliminating constructions). In other words, from an MP perspective there are reasons to ignore some of the data that linguists hold so dear.

There is a line (even Chomsky has pushed it) that MP offers nothing new. It is just the continuation of what we have always done in GG. There is one sense in which I think that this is right. The questions asked linguistics have investigated follow a natural progression if one’s interest is in the structure of FL. MP focuses on the next natural question to ask given the prior successes of GG. However, the question itself is novel, or at least it is approachable now in ways that it wasn’t before. This has consequences. I believe that one of the reasons behind a palpable hostility to MP (even among syntacticians) is the appreciation that it does change the shape of the board. Much of what we have taken for granted is rightly under discussion. It is like the shift away from constructions, but in an even more fundamental way.

[1] See here for more elaborate discussion of the Merge Hypothesis
[2] I discuss this again in a forthcoming post. I know you cannot wait.
[3] In other words, this argument form is not particularly novel when applied to language. As such one should beware to avoid methodological dualism and not subject the linguistic application of this gambit to higher standards than generally apply.
[4] See here for more discussion of this point.
[5] A personal judgment: I don’t believe that cross-linguistic study has generally changed our views about the principles. But this is very much a personal view, I suspect.

Thursday, July 20, 2017

Is linguistics a science?

I have a confession to make: I read (and even monetarily support) Aeon. I know that they publish junk (e.g. Evans has dumped junk on its pages twice), but I think the idea of trying to popularize the recondite for the neophyte is a worthwhile endeavor, even if it occasionally goes awry. I mention this because Aeon has done it again. The editors clearly understand the value (measured in eyeballs) of a discussion of Chomsky. And I was expecting the worst, another Evans like or Everett like or Wolfe like effort. In other words I was looking forward to extreme irritation. To my delight, I was disappointed. The piece (by Arika Okrent here) got many things right. That said, it is not a good discussion and will leave many more confused and misinformed than they should be. In what follows I will try to outline my personal listing of pros and cons. I hope to be brief, but I might fail.

The title of Okrent’s piece is the title of this post. The question at issue is whether Chomskyan linguistics is scientific. Other brands get mentioned in passing, but the piece Is linguistics a science? (ILAS), is clearly about the Chomsky view of GG (CGG). The subtitle sets (part of) the tone:

Much of linguistic theory is so abstract and dependent on theoretical apparatus that it might be impossible to explain

ILAS goes into how CGG is “so abstract” and raises the possibility that this level of abstraction “might” (hmm, weasel word warning!) make it incomprehensible to the non-initiated, but it sadly fails to explain how this distinguishes CGG from virtually any other inquiry of substance. And by this I mean not merely other “sciences” but even biblical criticism, anthropology, cliometrics, economics etc.  Any domain that is intensively studied will create technical, theoretical and verbal barriers to entry by the unprepared. One of the jobs of popularization is to allow non-experts to see through this surface dazzle to the core ideas and results. Much as I admire the progress that CGG has made over the last 60 years, I really doubt that its abstractions are that hard to understand if patiently explained. I speak from experience here. I do this regularly, and it’s really not that hard. So, contrary to ILAS, I am quite sure that CGG can be explained to the interested layperson and the vapor of obscurity that this whiff of ineffability spritzes into the discussion is a major disservice. (Preview of things to come: in my next post I will try (again) to lay out the basic logic of the CGG program in a way accessible (I hope) to a Sci Am reader).

Actually, many parts of ILAS are much worse than this and will not help in the important task of educating the non-professional. Here are some not so random examples of what I mean: ILAS claims that CGG is a “challenge to the scientific method itself” (2), suggests that it is “unfalsifiable” Popper-wise (2), that it eschews “predictions” (3), that it exploits a kind of data that is “unusual for a science” (5), suggests that it is fundamentally unempirical in that “Universal grammar is not a hypothesis to be tested, but a foundational assumption” (6), bemoans that many CGG claims are “maddeningly circular or at the very least extremely confusing” (6), complains that CGG “grew ever more technically complex,” with ever more “levels and stipulations,” and ever more “theoretical machinery” (7), asserts that MP, CGG’s latest theoretical turn confuses “even linguists” (including Okrent!) (7), may be more philosophy than science (7), moots the possibility that “a major part of it is unfalsifiable” and “elusive” and “so abstract and dependent on theoretical apparatus that it might be impossible to explain” (7), moots that possibility that CGG is post truth in that there is nothing (not much?) “at stake in determining which way of looking at things is the right one” (8), and ends with a parallel between Christian faith and CGG which are described as “not designed for falsification” (9). These claims, spread as they are throughout ILAS, leave the impression that CGG is some kind of weird semi mystical view (part philosophy, part religion, part science), which is justifiably confusing to the amateur and professional alike. Don’t get me wrong: ILAS can appreciate why some might find this obscure hunt for the unempirical abstract worth pursuing, but the “impulse” is clearly more Aquarian (as in age of) than scientific. Here’s ILAS (8):

I must admit, there have been times when, upon going through some highly technical, abstract analysis of why some surface phenomena in two very different languages can be captured by a single structural principle, I get a fuzzy, shimmering glimpse in my peripheral vision of a deeper truth about language. Really, it’s not even a glimpse, but a ghost of a leading edge of something that might come into view but could just as easily not be there at all. I feel it, but I feel no impulse to pursue it. I can understand, though, why there are people who do feel that impulse.

Did I say “semi mystical,” change that to pure Saint Teresa of Avila. So there is a lot to dislike here.[1]

That said, ILAS also makes some decent points and in this it rises way above the shoddiness of Evans, Everett and Wolfe. It correctly notes that science is “a messy business” and relies on abstraction to civilize its inquiries (1), it notes that “the human capacity for language,” not “the nature of language,” is the focus of CGG inquiry (5), it notes the CGG focus on linguistic creativity and the G knowledge it implicates (4), it observes the importance of negative data (“intentional violations and bad examples”) to plumbing the structure of the human capacity (5), it endorses a ling vs lang distinction within linguistics (“There are many linguists who look at language use in the real world … without making any commitment to whether or not the descriptions are part of an innate universal grammar”) (6), it distinguishes Chomsky’s conception of UG from a Greenberg version (sans naming the distinction in this way)  and notes that the term ‘universal grammar’ can be confusing to many (6):

The phrase ‘universal grammar’ gives the impression that it’s going to be a list of features common to all languages, statements such as ‘all languages have nouns’ or ‘all languages mark verbs for tense’. But there are very few features shared by all known languages, possibly none. The word ‘universal’ is misleading here too. It seems like it should mean ‘found in all languages’ but in this case it means something like ‘found in all humans’ (because otherwise they would not be able to learn language as they do.)

And it also notes the virtues of abstraction (7).

Despite these virtues (and I really like that above explanation of ‘universal grammar’), ILAS largely obfuscates the issues at hand and gravely misrepresents CGG. There are several problems.

First, as noted, a central trope of ILAS is that CGG represents a “challenge to the scientific method itself” (2). In fact one problem ILAS sees with discussions of the Everett/Chomsky “debate” (yes, scare quotes) is that it obscures this more fundamental fact. How is it a challenge? Well, it is un-Popperian in that it insulates its core tenets (universal grammar) from falsifiability (3).

There are two big problems with this description. First, so far as I can see, there is nothing that ILAS says about CGG that could not be said about the uncontroversial sciences (e.g. physics). They too are not Popper falsifiable, as has been noted in the philo of science literature for well over 50 years now. Nobody who has looked at the Scientific Method thinks that falsifiability accurately describes scientific practice.[2] In fact, few think that either Falsificationism or the idea that science has a method are coherent positions. Lakatos has made this point endlessly, Feyerabend more amusingly. And so has virtually every other philosopher of science (Laudan, Cartwright, Hacking to name three more). Adopting the Chomsky maxim that if a methodological dictum fails to apply to physics then it is not reasonable to hold linguistics to its standard, we can conclude that ILAS’s observation that certain CGG tenets are falsifiable (even if this is so) is not a problem peculiar to CGG. ILAS’s suggestion that it is is thus unfortunate.

Second, as Lakatos in particular has noted (but Quine also made his reputation on this, stealing the Duhem thesis), central cores of scientific programs are never easily directly empirically testable. Many linking hypotheses are required which can usually be adjusted to fend off recalcitrant data.  This is no less true in physics than in linguistics.  So, having cores that are very hard to test directly is not unique to CGG. 

Lastly, being hard to test and being unempirical are not quite the same thing. Here’s what I mean. Take the claim that humans have a species specific dedicated capacity to acquire natural languages. This claim rests on trivial observations (e.g. we humans learn French, dogs (smart as they are) don’t!). That this involves Gs in some way is trivially attested by the fact of linguistic creativity (the capacity to use and understand novel sentences). That it is a species capacity is obvious to any parent of any child. These are empirical truisms and so well grounded in fact that disputing their accuracy is silly. The question is not (and never has been) whether humans have these capacities, but what the fine structure of these capacities is.  In this sense, CGG is not a theory, anymore than MP is. It is a project resting on trivially true facts. Of course, any specification of the capacity commits empirical and theoretical hostages and linguists have developed methods and arguments and data to test them. But we don’t “test” whether FL/UG exists because it is trivially obvious that it does. Of course, humans are built for language like ants are built to dead reckon or birds are built to fly or fish to swim.  So the problem is not that this assumption is insulated from test and thus holding it is unempirical and unscientific. Rather this assumption is not tested for the same reason that we don’t test the proposition that the Atlantic Ocean exists. You’d be foolish to waste your time.  So, CGG is a project, as Chomsky is noted as saying, and the project has been successful as it has delivered various theories concerning how the truism could be true, and these are tested every day, in exactly the kinds of ways that other sciences test their claims. So, contrary to ILAS, there is nothing novel in linguistic methodology. Period. The questions being asked are (somewhat) novel, but the methods of investigation are pure white bread.[3] That ILAS suggests otherwise is both incorrect and a deep disservice.

Another central feature of ILAS is the idea that CGG has been getting progressively more abstract, removed from facts, technical, and stipulative. This is a version of the common theme that CGG is always changing and getting more abstruse. Is ILAS pining for the simple days of LSLT and Syntactic Structures? Has Okrent read these (I actually doubt it given that nobody under a certain age looks at these anymore). At any rate, again, in this regard CGG is not different from any other program of inquiry. Yes, complexity flourishes for the simple reason that more complex issues are addressed. That’s what happens when there is progress. However, ILAS suggests that contemporary complexity contrasts with the simplicity of an earlier golden age, and this is incorrect. Again, let me explain.

One of the hallmarks of successful inquiry is that it builds on insights that came before. This is especially true in the sciences where later work (e.g. Einstein) builds on early work (e.g. Newton). A mark of this is that newer theories are expected to cover (more or less) the same territory as previous ones. One way of doing this for newbies to have the oldsters as limit cases (e.g. you get Newton from Einstein when speed of light is on the low side). This is what makes scientific inquiry progressive (shoulders and giants and all that). Well linguistics has this too (see here for first of several posts illustrating this with a Whig History). Once one removes the technicalia (important stuff btw), common themes emerge that have been conserved through virtually every version of CGG accounts (constituency, hierarchy, locality, non-local dependency, displacement) in virtually the same way. So, contrary to the impression ILAS provides, CGG is not an ever more complex blooming buzzing mass of obscurities. Or at least not more so than any other progressive inquiry. There are technical changes galore as bounds of empirical inquiry expand and earlier results are preserved largely intact in subsequent theory. The suggestion that there is something particularly odd of the way that this happens in CGG is just incorrect. And again, suggesting as much is a real disservice and an obfuscation.

Let me end with one more point, one where I kinda like what ILAS says, but not quite. It is hard to tell whether ILAS likes abstraction or doesn’t. Does it obscure or clarify? Does it make empirical contact harder or easier?  I am not sure what ILAS concludes, but the problem of abstraction seems contentious in the piece.  It should not be. Let me end on that theme.

First, abstraction is required to get any inquiry off the ground. Data is never unvarnished. But more importantly, only by abstracting away from irrelevancies can phenomena be identified at all. ILAS notes this in discussing friction and gravitational attraction. It’s true in linguistics too. Everyone recognizes performance errors, most recognize that it is legit to abstract away from memory limitations in studying the G aspects of linguistic creativity. At any rate, we all do it, and not just in linguistics. What is less appreciated I believe is that abstraction allows one to hone one’s questions and make it possible to make contact with empirics. It was when we moved away from sentences uttered to judgments about well formedness investigated via differential acceptability that we were able to start finding interesting Gish properties of native speakers. Looking at utterances in all their gory detail, obscures what is going on. Just as with friction and gravity.  Abstraction does not make it harder to find out what is going on, but easier.

A more contemporary example of this in linguistics is the focus on Merge. This abstracts away from a whole lot of stuff. But, it also by ignoring many other features of G rules (besides the capacity to endlessly embed) allows for inquiry to focus on key features of G operations: they spawn endlessly many hierarchically organized structures that allow for displacement, reconstruction, etc.  It also allows one to raise in simplified form new possibilities (do Gs allow for SW movement? Is inverse control/binding possible?). Abstraction need not make things more obscure. Abstracting away from irrelevancies is required to gain insight. It should be prized. ILAS fails to appreciate how CGG has progressed, in part, by honing sharper questions by abstracting away from side issues. One would hope a popularization might do this. ILAS did not. It made appreciating abstractions virtues harder to discern.

One more point: it has been suggested to me that many of the flaws I noted in ILAS were part of what made the piece publishable. In other words, it’s the price of getting accepted.  This might be so. I really don’t know. But, it is also irrelevant. If this is the price, then there are worse things than not getting published.  This is especially so for popular science pieces. The goal should be to faithfully reflect the main insights of what one is writing about. The art is figuring out how to simplify without undue distortion. ILAS does not meet this standard, I believe.

[1] The CGG as mysticism meme goes back a long way. I believe that Hockett’s review of  Chomsky’s earliest work made similar suggestions.
[2] In fact, few nowadays are able to identify a scientific method. Yes, there are rules of thumb like think clearly, try hard, use data etc. But the days of thinking that there is a method, even in the developed sciences, is gone.
[3] John Collins has an exhaustive and definitive discussion of this point in his excellent book (here). Read it and then forget about methodological dualism evermore.