Monday, March 20, 2017

Lexical items and "mere" morphology-1

This was intended to be short. I failed. It is long and will get longer. Here is part 1. Part 2 sometime later this week or next.

As I mentioned in an earlier post, I am in the process of co-editing a volume of commentary essays on Syntactic Structures (SS). The volume is scheduled to be out just in time for the holidays and will, I am sure, make a great gift.  Nothing like an anniversary copy of SS with a compendium of essays elaborating its nuances to while away the holidays. I mention this because the project has got me thinking about how our theories of grammar have changed over time. And this brings me to the topic of today’s question: are all morphemes created equal?

Interestingly, GG theories answer this question differently. SS and Aspects sharply distinguish, at least theoretically, between two kinds of morphemes: those that enter derivations via lexical insertion, and those that enter transformationally. In this way, these theories make a principled distinction between grammatical vs non-grammatical formatives and track their grammatical differences to different G etiologies.

Later theories (take GB as the poster child) distinguish lexical vs functional morphemes, but, and this is important, there appears to be no principled distinction here. The latter more closely track important G features, but both types of formatives enter derivations in the same way (via lexical insertion or heads of X’ projections) and are manipulated by the same kinds of rules. The main difference (which I return to) is that some lexical items require specific grammatical licensing conditions (e.g. reflexives, pronouns, wh-elements) while others don’t (there is no grammatical licensing condition for ‘cat’ or ‘husband’). Functional elements are also often designated “closed class” items, but this classification carries no obvious theoretical import, at least within the theory of grammar. Rather, the designation is descriptive and adverts to the statistical frequency of these elements. Grammatically speaking, it is unclear what makes an expression “functional” beyond the fact that we designate it as such.

Minimalist accounts fall roughly on the GB side of these issues. This, I believe is unfortunate for the earlier distinction between lexical and grammatical formatives is, IMO, worth a modern investigation. Before saying a few words why I believe this, let me indulge my penchant for Whig History and illustrate the distinction contrasting the older Lees-Klima binding theory with the more modern GB view. Readers be warned, this will not be a short excursus.

Let’s start with the Less-Klima (LK) (1963) account.  The theory invokes the following two rules.  They must apply when they can and they are ordered so that (1) applies before (2).
            (1) Reflexivization:
X-NP1- Y- NP2 - Z --> X- NP1-Y- pronoun+self-Z,                               (Where NP1=NP2, pronoun has the f-features of NP2, and NP1/NP2 are in the same simplex sentence and).
(2) Pronominalization:
X-NP1-Y-NP2-Z --> X-NP1-Y- pronoun-Z                                                             (Where NP1=NP2 and pronoun has the f-features of NP2).

As is evident, the two rules have very similar forms. Both apply to identical NPs and morphologically convert one to a reflexive or pronoun. (1), however, only applies to nominals in the same simplex clause, while (2) is not similarly restricted. As (1) obligatorily applies before (2), reflexivization will bleed the environment for the application of pronominalization by changing NP2 to a reflexive (thereby rendering the two NPs non-identical).  The rule ordering effectively derives the complementary distribution of bound pronouns and reflexives. 

An illustration will help make things clear. Consider the derivations of (3a).  It has the underlying structure in (3b). We can factor this as in (3c) as per the reflexivization rule (1). This results in converting (3c) to (3d) with the surface output (3e) carrying a reflexive interpretation.
(3)       a. John1 washed himself/*him
            b. John washed John
            c. X-John-Y-John-Z
            d. X-John-Y-him+self-Z
            e. John washed himself
What blocks John likes him with a similar reflexive reading? To get this structure requires that Pronominalization apply to (3c).  However, it cannot as (1) is ordered to obligatorily apply before (2).  Once (1) applies we get (3d) and this is no longer has a structural description amenable to (2). Thus, the application of (1) bleeds that of (2) and John likes him with a bound reading cannot be derived.

This changes in (4). Reflexivization cannot apply to (4c) as the two Johns are not in the same clause. As (1) cannot apply, (2) can (indeed, must) as it is not similarly restricted to apply to clausemates. In sum, the inability to apply (1) allows the application of (2). Thus does the LK theory derive the complementary distribution of reflexives and bound pronouns.
(4)       a. John believes that Mary washed *himself/him
            b. John believes that Mary washed John
            c. X-John-Y-John
            d. X-John-Y-him
            e. John believes that Mary washed him
There are other features of note:

·      *LK Grammars code for antecedence: Anaphoric dependency is grammatically specified. In other words, just as the antecedent of a reflexive is determined by (1), the antecedent of an anaphoric pronoun is determined by (2). If one understands “NP1 = NP2” to mean that the two nominals must (at least) have the same semantic value (i.e. that NP1 semantically binds NP2) then what the equality expresses is the idea that the grammar codes semantic binding and semantic antecedence.[1]  This has two consequences. First, that the grammar codes binding dependencies, not (co-)referential dependencies.[2] Second, there is no analogue of GB’s Condition B, which grammatically states an anti-binding restriction. (1) and (2) together determine the class of anaphoric dependencies. There is no specific coding for disjoint reference or anti-anaphora.[3]
·      *Some operations have priority over others. A key feature of the LK approach is that reflexivization obligatorily applies before pronominalization.  Were the operations either freely ordered or not obligatory then John hugged him would support the bound reading of the pronoun.  In effect, the LK account embodies an economy conception wherein reflexivization is preferred to (is obligatorily ordered before) pronominalization. Absent this preference, locally bound pronouns would be grammatically generated.  This point is made evident by considering a slight alternative version of the Pronominalization rule. Assume that we added the following rider to (2): NP1 and NP2 are not contained in the same simplex clause.  This codicil is analogous to the restriction in (1), where Reflexivization is limited to clause-mates.  Interestingly, this amendment allows (1) and (2) to be freely ordered. The clause-mate condition in (1) restricts application to clause-mated nominals and the one in (2) to non-clause-mated NPs. This suffices to prevent the illicit pronouns and reflexives in (3a)/(4a).[4]
·     * The LK approach is dependency centered not morpheme centered. (1) and (2) primarily code antecedence relations not morpheme distributions. A by-product of the dependency (in English) is the insertion of reflexive and pronominal morphemes. These are clearly surface morpho-phonological byproducts of the established dependency and can be expected to differ across languages.[5] Stated more baldly, one can have reflexive and bound pronoun constructions without reflexives or bound pronouns.  This gives the LK theory two distinctive characteristics when viewed with a modern eye. First, it distinguishes between morphemes that enter derivations from the lexicon and those that do not.  Second, it endows this distinction between morpheme types with semantic significance. In the context of the Standard Theory, the LK background theory, bound pronouns and reflexives are semantically inert. Here Deep Structure exclusively determines semantic interpretation. Consequently, as reflexive and bound pronoun morphemes are not in Deep Structure but are introduced in the course of the syntactic derivation they must be interpretively impotent.  There is one more interesting consequence, the LK conception rejects a central feature of later accounts: that morphological identity is a good guide to syntactic or semantic categorization.  In other words, for LK theorists, the mere fact that bound pronouns and deictic pronouns have the same morpho-phonological shape in many languages is at best a weak prima facie reason for treating them as a unified syntactic class.
·      *The binding rules in (1) and (2) also effectively derive a class of principle C effects given the background assumption that reflexives and pronouns morphologically obscure an underlying copy of the antecedent.[6] The derivation, however, is not particularly deep.  By stipulation, the rules retain the higher copies and morphologically reshape the lower ones into pronouns and reflexives. This has the effect of blocking the derivation of sentences like Himself kissed Bill, He thinks that John is tall, and (if the rules are ordered before WH-movement (aka Question formation) Who1 did he say t1 left. There are two noteworthy features of this account of principle C effects. First, as noted it is not deep for there is no reason for why the rules could not have been stated so that the higher copy (optionally) gets morphologically transmogrified. Were this possible all the indicated unacceptable sentences would be fully grammatically generated.  Second, this version of principle C effects only holds for bound anaphors. It does not extend to co-referential dependencies, which fall outside the purview of this version of the binding theory.  This is not, in itself a bad thing. As has been noted, there are well known “exceptions” to principle C where co-reference is tolerated. On the LK account, this is to be expected.[7]

In sum: for LK the syntax outputs antecedent-anaphor dependencies. This is explicitly and directly coded in the relevant binding rules.  The proposal has two central features: an economy condition in the guise of the preference for reflexivization over pronominalization and a distinction between formatives that enter derivations via rules like Lexical Insertion (e.g. words like cat, dog, the, this, deictic pronouns, etc.) and those that are the morphological by-products of rules of grammar (e.g. words like himself and certain bound hims that are morphological residues of established anaphoric dependencies).

[1] It must code more than this however for otherwise (2) could apply to the output of (1). It would suffice to block this to assume that some kind of syntactic identity is also required, e.g. that the two be tokens of the same type. For further discussion c.f. Hornstein 2001 and note 3.
[2] Figuring how to make this clear led to problems with the original LK account. For example, how exactly to code (i)?  It does not semantically express (ii).

(i)            Everyone hugged himself
(ii)          Everyone hugged everyone
Interestingly this problem for the LK theory has an answer in contemporary minimalist approaches if we take binding to be a chain relation.  In effect, the difference between the underlying form of (i) vs (ii) is that the latter has two selections of everyone in the numeration while the former has one. In other words, if we treat (1) and (2) as morphological spell out rules defined over chains, this problem disappears.  C.f. Drummond 2011, Idsardi and Lidz 1997 and Hornstein 2001 for discussion. We return to this point again later on.
[3] Lasnik’s 1976 proposal for an anti-co-reference rule is built around the problems regarding “accidental” co-reference that this fact entails.  Contemporary attempts to return to the LK vision have roughly followed Reinhart  in assuming that the possibility of grammatical binding restricts extra-grammatical co-reference options.
[4] We must still assume that they are obligatory, but this is to block principle C effects (e.g. John saw John, and John said that Mary like John) rather than assure the complementarity of reflexives and bound pronouns. 
[5] This is very much a Distributed Morphology conception, though in earlier theoretical guise.
[6] Recall that this assumption creates problems for quantified NP antecedents as remarked in footnote 3.
[7] C.f. Evans and Reinhart among others.  Note, in addition, that there are virtually no extant cases of inverse binding, i.e. where a pronoun is anaphorically dependent on an antecedent it c-commands.  Furthermore, even WCO configurations would seem to be underivable given the actual rules proposed.  Nice as this is, it is worth recalling that this empirical success arises from codifying the stipulation that it is the higher/leftmost copy that is retained and the lower rightmost copy that gets morphologically altered.

Wednesday, March 8, 2017

Elite science

So much for standing on one another's shoulders and science being a community exercise. It seems that there is real panic among the "elite," so much so that it is becoming important to make sure that everyone understands that all the good things we have in life are thanks to the hard work of the precious few who really are intrinsically better.  Here is the latest salvo in this direction. Scientists are under siege for being elitist. The hoards are at the gate.

I would feel a lot better about this blather were it not so evidently so self-serving. Here scientists are being pulled into service to protect the status of experts with dubious expertise.  Economists are all aflutter because nobody shows them any respect anymore. I cannot imagine why, can you? Experts in politics, polling, terrorism and more are just being dissed endlessly. Oh my!

Sadly, some of this also hits real science. Yes there is global warming and yes it is caused by human burning fossil fuels. However, when one looks at what is discrediting this kind of research, it is not the masses rising up with pitchforks to pillory scientists. It is large powerful groups (indeed elite organizations) organized to with the agnotological agenda of spreading doubt, confusion and ignorance.

Elitism is the view that the betters ought to rule. It really has no place in science. Ideally, it's ideas that should lead. Who has the good ideas changes and the fact that someone had one good idea does not imply that that person's current idea is good as well. Ideally, things should be organized so that influence follows the good ideas. Nobody knows how to make this happen exactly. But that is the ideal. What we don't want is deference. That is terrible. But that's what elites want: deference. And the fact that science and scientists are being recruited to the cause of elitism by pundits indicates that some people must really feel that the world is shaking beneath them. Something to celebrate, IMO.

Tuesday, March 7, 2017

What's going on?

I was flying to Montreal on Sunday reading a recent issue of Science and came across a new piece that shocked me. It deals with the dismissal of Allen Braun from the NIH (here). His group in the NICDC was investigated by the NIH for violations having to do with the processing of subjects for fMRI experiments and were found wanting. The consequences of this investigation include dismissal for Braun and an embargo of all of the data collected in his lab for the last 25 years. Yes, you read that right. 25. The NIH is not allowing any data collected under Braun's supervision to be used for new publications. This includes not only his own work but also that of Post Docs, Grad students, colleagues, undergrads; whoever did any data collection in his unit.

Now, you may be thinking that this is because the data was fraudulently concocted, just another case of manufacturing bad data. But nope, that's not it at all. Rather, the problem seems to be that Braun may have not been punctilious in processing subjects. Apparently, the NIH has a protocol that requires subjects to get medical oks to participate (a reasonable enough requirement). Braun is accused of being sloppy wrt to signing off on some of these. Here's what the NIH audit found:
The audit, which is dated February 2016 and which Science obtained, noted that Braun had not signed off on histories and physicals for 206 of the 424 volunteers whose records the audit examined. But the audit also noted that of those 206, all but five had received a history and physical elsewhere at the agency, because they were participating in other NIH studies, too.
There were other, what appear to be, paperwork issues as well, but as the Science piece points out:
 There is no evidence, they have argued in letters to NIH officials, that the violations compromised the bulk of the data or the safety of study volunteers.
Nonetheless, the data collected cannot be used by anyone, thereby derailing a lot of basic research, and, I might add, suggesting that there is something tainted in the data and/or that Braun did something fraudulent and/or immoral. Note I say "suggesting." Importantly there is no hint of evidence supporting this conclusion, but the severity of the punishment and quiescence of the NIH in responding to objections invites the suspicion that there is more to the story than is being reported.

Now note: it is agreed that nothing untoward happened due to this alleged "sloppiness." There is not even a charge that the sloppiness was deliberate. Indeed, it seems that the biggest charge had to do with 206 histories and physicals Braun did not sign off on but of these 201 were done and signed off on in other NIH labs. So, we are looking at wrecking or, at least, seriously damaging many careers for what appears to be trivial reasons. The reasons may be more serious, but the NICDC is not talking. They are taking the CYA position that a legal proceeding is pending so they are staying mum.

And, the NIH (more exactly, the NICDC) will not back down. It seems that they really don't care about the consequences for young investigators of their embargo. Interestingly, the NIH doesn't seem to want to antagonize published authors for the NICDC is not asking that papers using the same data be retracted if already published. So the data is good enough if out there but not good enough to be put out there. I am having a hard time understanding what justifies the invidious distinction the NICDC is drawing here between the two kinds of data. Unpublished data, bad. Published data, ok. Same data, different judgments. Why?

I have known Allen Braun for quite a while. He is not a close friend but he is someone that I talked to quite a bit over the years and I find it hard to believe that he and his collaborators deserve any of this. I have no idea what the real cause for this extremely harsh (and from what I can tell from the Science piece, unprecedented) treatment is. But it would be nice to know. Scientists like to tell themselves stories that the enterprise is driven by the noblest values and passions: the desire to know, disinterested curiosity, noble urges towards the truth. But we all know that this is junk. Individual scientists are like everyone else, motivated in many different ways. Scientific bureaucracies are like all others, sometimes self-serving and protective and political. Science is supposed to be built to withstand these motives, not eliminate them. However, science like all pastimes can have a less pleasant side. This sure smells like one of those cases. The NIH should examine this and make public what really happened. It's hard for me to believe that the reported infractions justify such a brutal and heavy handed response. One had better have very good grounds for ruining people's careers.

Friday, March 3, 2017

In case you were wondering

Online education was, IMO, always intended for others. You know as ways of expanding opportunities for the lower classes to advance (and to milk them in the process). Elite schools aiming to establish brand names to allow them to compete with Phoenix University to bring education  to the masses. Well, it seems that this is running into problems (see here). The products stink. Now there will be calls to monitor their quality and improve them. But really, don't expect much. The goal of the MOOCosphere was to make money. The money would come from bringing Hahvahd educations to those that could not afford the price tag. San Jose State (and, one day, UMD) could become satellites campuses where the best and the brightest would remotely teach the unwashed at arm's length. I wish I could say that I was surprised that the enterprise has hit another big bump in the road. Thank goodness.


Is saying that something is interesting cognitively valuable? I ask this because I recently read a piece (here) suggesting that the term is more emotive than cognitive.  Or, as the piece (author: Corey Powell (CP)), puts it “[c]alling something interesting is the height of sloppy thinking. Interesting is not descriptive, not objective, and not even meaningful.” Thus, thought use of the term may carry information, it is subjective information about the user (denoting, perhaps, something like entertainment value (CP: “In practice, interesting is a synonym for entertaining”)) rather than an objective judgment about the intellectual content of the subject matter. Here’s the CP article’s take:

…if someone tells you ‘this is interesting’, remember that they aren’t describing the thing at all. They are describing the effect of that thing on them. Even though we hear it a lot from the would-be Vulcans around us, interesting is a subjective, emotional word, not the objective, logical word we want it to be.

I disagree. Here are some reasons why.

First, I doubt the claim that interesting is merely evaluative. What I mean is that it is perfectly reasonable to ask why someone considers something interesting, while it is far less clear that it is equally apposite to ask why someone considers something entertaining. Entertaining is a more like tasty in that it is a matter of mere taste.[1] Someone’s tastes may be perverse, but they are what they are. Asking someone why they find something entertaining is less asking for information than asking for justification (as in: that is no laughing matter and so not entertaining, and so you shouldn’t be laughing, unless you are a pervert). Interesting is different. Unlike tastes, interests can be defended, criticized, and reasoned about. In particular, it is legit to ask for a defense of one’s interests in polite company. In this sense, then, interesting aspires to cognitive content in ways that entertaining does not.

Second, there are many things worse than being entertaining. Indeed, I would go further, at least when it comes to intellectual matters the best ideas are vastly entertaining. There is nothing quite like the feeling of delight that accompanies ingesting and digesting a really good insight. It has a palpable taste and the better the idea the rounder the experience. Think wine, but a whole body experience. Think sex, but the high lasts longer. Good ideas are entertaining and that is one very good reason for relentlessly pursuing them.

Let me harp on this. I was recently discussing the emotional vicissitudes of academic life with a grad student. The major downside is that research is deeply unfair. Reward is not guaranteed to the just or the hard working. Lazy shits can, and do, succeed. The reason is that there is no unity to the good: bad people can be smart, beautiful humans can be lazy, lazy people can be lucky, virtuous people can be dumb. Thus, the fact that you have done everything “right” and have lived a righteous research life does not mean that you will ever run into a good idea. But, and this was what we discussed, the possibility of doing so is what drives many of the academics I know. Once you’ve tripped over one, or one has snuck up and grabbed you the sensation is so fantastic and intoxicating that the (often hopeless) pursuit of others of the same ilk (ideas that will generate a similar visceral sensation) becomes addictive. So, never let anyone tell you that intellectual work should not be entertaining to be serious. Truth may or may not be beauty, but deep ideas really are entertaining, which is one good reason to look for them.

Does this mean that everything that is entertaining is a good idea? No, there is such a thing as cheap entertainment (and I don’t reject cheap entertainment either, but it’s not the same thing) and its joys are different. But, I do think that being entertaining in the right way is a mark of a serious idea and, hence, a useful symptom of one.

Third, evaluations of interest serve an important function in research. Interesting is a predicate mainly of ideas, as opposed to facts. This is not to say that facts might not be interesting, but I think that their interest is at one remove.  They are interesting for what they might tell us or suggest about theories, ideas, hypotheses. In and of themselves, facts are, well, facts.  Being interesting lies not primarily in what there is, but in why what there is is the way that it is. And this requires explanations. Theories, hypotheses, guesses, conjectures, thought!! And that is why interesting is an important adjective and not, as the CP article claims, just so much “linguistic connective tissue.”

Again let me elaborate. As I’ve lamented many times before, linguists tend to undervalue theory. There is a “just the facts ma’am” attitude abroad in the land where corralling a stray data point is taken to be most important thing research can hope to achieve. Interesting is the adjective of choice for the anti-attitude to this. We don’t have nearly enough questions of the following sort: Why is what you are doing interesting? Why should anyone be interested in this? Why should I care? Questions like these force explanatory concerns onto the table. And as the aim of research is to explain (with description being in service of explanation) asking why a proposal is interesting is asking for the proposal’s explanatory oomph. And, sad to say, many advancing a proposal have little idea of why or what makes it interesting. The problem is not then with the term. Asking for interesting is asking about/for a real, abstract but objective, facet of ideas. The problem is that too many people have no idea how to explain why and how a proposal is of interest. Many would rather dwell on the data coverage and forget about the stories that make sense of them and that the facts should be in service of. Many think that data speak for themselves, or at least should do so. Many think that data, especially Big Data, will make interesting superfluous. Many hope that we can mechanize thought and eliminate imagination. Many many many hate theory and think it pretentious. These same many distrust interesting because it adverts to theory. IMO, that is too bad.

So, do I think that CP has gotten it wrong? Not entirely. Part of what CP says seems right to me. It is that we have lost grip on how the term ought to be used. Ideas are objectively interesting or not. Proposals can be ranked wrt their oomph scale. Science amounts to more than collecting, cleaning, and arranging facts. Science aims to explain, which is why we prize it. When done well it provides cognitive kicks with their own special flavor. When done well, it is interesting and entertaining. The hollowing out of interesting is yet another sign that the explanatory aims of science (and intellectual life more generally) are under data siege. It is a leading indicator of the rise of a pernicious Empiricism, one that takes theory and hypothesis to be little more than a way of summarizing the facts, and hence something once removed from what is real. And one main problem with this view is that it is so uninteresting.

[1] I say mere here deliberately. Taste is often worth debating. So de gustibus disputandum est. In fact, it is often the most important thing to debate. But, this is less true for low level tastes (maybe better termed preferances). I like vanilla you don’t. What’s to debate. But, I like theories that shed light on FL and you don’t, THAT I am happy to debate till the cows come home, and then some.

Thursday, February 23, 2017

Optimal Design

In a recent book (here), Chomsky wants to run an argument to explain why the Merge, the Basic Operation, is so simple. Note the ‘explain’ here. And note how ambitious the aim. It goes beyond explaining the “Basic Property” of language (i.e. that natural language Gs (NLG) generate an unbounded number of hierarchically structured objects that are both articulable and meaningful) by postulating the existence of an operation like Merge. It goes beyond explaining why NLGs contain both structure building and displacement operations and why displacement is necessarily to c-commanding positions and why reconstruction is an option and why rules are structure dependent. These latter properties are explained by postulating that NLGs must contain a Merge operation and arguing that the simplest possible Merge operation will necessarily have these properties. Thus, the best Merge operation will have a bunch of very nice properties.

This latter argument is interesting enough. But in the book Chomsky goes further and aims to explain “[w]hy language should be optimally designed…” (25). Or to put this in Merge terms, why should the simplest possible Merge operation be the one that we find in NLGs? And the answer Chomsky is looking for is metaphysical, not epistemological.

What’s the difference? It’s roughly this: even granted that Chomsky’s version of Merge is the simplest and granted that on methodological grounds simple explanations trump more complex ones, the question remains, given all of this why should the conceptually simplest operation be the one that we in fact have.  Why should methodological superiority imply truth in this case?  That’s the question Chomsky is asking and, IMO, it is a real doozy and so worth considering in some detail.

Before starting, a word about the epistemological argument. We all agree that simpler accounts trump more complex ones. Thus if some account A is involves fewer assumptions than some alternative account A’ then if both are equal in their empirical coverage (btw, none of these ‘if’s ever hold in practice, but were they to hold then…) then we all agree that A is to be preferred to A’. Why? Well because in an obvious sense there is more independent evidence in favor of A then there is for A’ and we all prefer theories whose premises have the best empirical support. To get a feel for why this is so let’s analogize hypotheses to stools. Say A is a three legged and A’  a four legged stool. Say that evidence is weight that these stools support. Given a constant weight each leg on the A stool supports more weight than each of the A’ stool, about 8% more.  So each of A’s assumption are better empirically supported than each of those made by A’. Given that we prefer theories whose assumptions are better supported to those that are less well supported A wins out.[1]

None of this is suspect. However, none of this implies that the simpler theory is the true one. The epistemological privilege carries metaphysical consequences only if buttressed by the assumption that empirically better supported accounts are more likely to be true and, so far as I know, there is actually no obvious story as to why this should be the case short of asking Descarte’s God to guarantee that our clear and distinct ideas carry ontological and metaphysical weight. A good and just God would not deceive us, would she?

Chomsky knows all of this and indeed often argues in the conventional scientific way from epistemological superiority to truth. So, he often argues that Merge is the simplest operation that yields unbounded hierarchy with many other nice properties and so Merge is the true Basic Operation. But this is not what Chomsky is attempting here. He wants more! Hence the argument is interesting.[2]

Ok, Chomsky’s argument. It is brief and not well fleshed out, but again it is interesting. Here it is, my emphasis throughout (25).

Why should language be optimally designed, insofar as the SMT [Strong Minimalist Thesis, NH] holds? This question leads us to consider the origins of language. The SMT hypothesis fits well with the very limited evidence we have about the emergence of language, apparently quite recently and suddenly in the evolutionary time scale…A fair guess today…is that some slight rewiring of the brain yielded Merge, naturally in its simplest form, providing the basis for unbounded and creative thought, the “great leap forward” revealed in the archeological record, and the remarkable difference separating modern humans from their predecessors and the rest of the animal kingdom. Insofar as the surmise is sustainable, we would have an answer to questions about apparent optimal design of language: that is what would be expected under the postulated circumstances, with no selectional or other pressures operating, so the emerging system should just follow laws of nature, in this case the principles of Minimal Computation – rather the way a snowflake forms.

So, the argument is that the evolutionary scenario for the emergence of FL (in particular its recent vintage and sudden emergence) implies that whatever emerged had to be “simple” and to the degree we have the evo scenario right then we have an account for why Merge has the properties it has (i.e. recency and suddenness implicate a simple change).[3] Note again, that this goes beyond any methodological arguments for Merge. It aims to derive Merge’s simple features from the nature of selection and the particulars of the evolution of language. Here Darwin’s Problem plays a very big role.

So how good is the argument? Let me unpack it a bit more (and here I will be putting words into Chomsky’s mouth, always a fraught endeavor (think lions and tamers)). The argument appears to make a four way identification: conceptual simplicity = computational simplicity = physical simplicity = biological simplicity. Let me elaborate.

The argument is that Merge in its “simplest form” is an operation that combines expressions into sets of those expressions. Thus, for any A, B: Merge (A, B) yields {A, B}. Why sets? Well the argument is that sets are the simplest kinds of complex objects there are. They are simpler than ordered pairs in that the things combined are not ordered, just combined. Also, the operation of combining things into sets does not change the expressions so combined (no tampering). So the operation is arguably as simple a combination operation that one can imagine. The assumption is that the rewiring that occurred triggered the emergence of the conceptually simplest operation. Why?

Step two: say that conceptually simple operations are also computationally simple. In particular assume that it is computationally less costly to combine expressions into simple sets than to combine them as ordered elements (e.g. ordered pairs). If so, the conceptually simpler an operation then the less computational effort required to execute it. So, simple concepts imply minimal computations and physics favors the computationally minimal. Why?

Step three: identify computational with physical simplicity. This puts some physical oomph into “least effort,” it’s what makes minimal computation minimal. Now, as it happens, there are physical theories that tie issues in information theory with physical operations (e.g. erasure of information plays a central role in explaining why Maxwell’s demon cannot compute its way to entropy reversal (see here on the Landauer Limit)).[4] The argument above seems to be assuming something similar here, something tying computational simplicity with minimizing some physical magnitude. In other words, say computationally efficient systems are also physically efficient so that minimizing computation affords physical advantage (minimizes some physical variable). The snowflake analogy plays a role here, I suspect, the idea being that just as snowflakes arrange themselves in a physically “efficient” manner, simple computations are also more physically efficient in some sense to be determined.[5] And physical simplicity has biological implications. Why?

The last step: biological complexity is a function of natural selection, thus if no selection, no complexity. So, one expects biological simplicity in the absence of selection, the simplicity being the direct reflection of simply “follow[ing] the laws of nature,” which just are the laws of minimal computation, which just reflect conceptual simplicity.

So, why is Merge simple? Because it had to be! It’s what physics delivers in biological systems in the absence of selection, informational simplicity tied to conceptual simplicity and physical efficiency. And there could be no significant selection pressure because the whole damn thing happened so recently and suddenly.

How good is this argument? Well, let’s just say that it is somewhat incomplete, even given the motivating starting points (i.e. the great leap forward).

Before some caveats, let me make a point about something I liked. The argument relies on a widely held assumption, namely that complexity is a product of selection and that this requires long stretches of time.  This suggests that if a given property is relatively simple then it was not selected for but reflects some evolutionary forces other than selection. One aim of the Minimalist Program (MP), one that I think has been reasonably well established, is that many of the fundamental features of FL and the Gs it generates are in fact products of rather simple operations and principles. If this impression is correct (and given the slippery nature of the notion “simple” it is hard to make this impression precise) then we should not be looking to selection as the evolutionary source for these operations and principles.

Furthermore, this conclusion makes independent sense. Recursion is not a multi-step process, as Dawkins among others has rightly insisted (see here for discussion) and so it is the kind of thing that plausibly arose (or could have arisen) from a single mutation. This means that properties of FL that follow from the Basic Operation will not themselves be explained as products of selection. This is an important point for, if correct, it argues that much of what passes for contemporary work on the evolution of language is misdirected. To the degree that the property is “simple” Darwinian selection mechanisms are beside the point. Of course, what features are simple is an empirical issue, one that lots of ink has been dedicated to addressing. But the more mid-level features of FL a “simple” FL explains the less reason there is for thinking that the fine structure of FL evolved via natural selection. And this goes completely against current research in the evo of language. So hooray.

Now for some caveats: First, it is not clear to me what links conceptual simplicity with computational simplicity. A question: versions of the propositional calculus based on negation and disjunction or negation and disjunction are expressively equivalent. Indeed, one can get away with just one primitive Boolean operation, the Sheffer Stroke (see here). Is this last system more computationally efficient than one with two primitive operations, negation and/or conjunction/disjunction? Is one with three (negation, disjunction and conjunction) worse?  I have no idea. The more primitives we have the shorter proofs can be. Does this save computational power? How about sets versus ordered pairs? Is having both computationally profligate? Is there reason to think that a “small rewiring” can bring forth a nand gate but not a neg gate and a conjunction gate? Is there reason to think that a small rewiring naturally begets a merge operation that forms sets but not one that would form, say, ordered pairs? I have no idea, but the step from conceptually simple to computationally more efficient does not seem to me to be straightforward.

Second, why think that the simplest biological change did not build on pre-existing wiring? So, it is not hard to imagine that non-linguistic animals have something akin to a concatenation operation. Say they do. Then one might imagine that it is just as “simple” to modify this operation to deliver unbounded hierarchy as it is to add an entirely different operation which does so. So even if a set forming operation were simpler than concatenation tout court (which I am not sure is so), it is not clear that it is biologically simpler to derive hierarchical recursion from a modified conception of concatenation given that it already obtains in the organism then it is to ignore this available operation and introduce an entirely new one (Merge). If it isn’t (and how to tell really?) then the emergence of Merge is surprising given that there might be a simpler evolutionary route to the same functional end (unbounded hierarchical objects via descent with modification (in this case modification of concatenation)).[6]

Third, the relation between complexity of computation and physical simplicity is not crystal clear for the case at hand. What physical magnitude is being minimized when computations are more efficient? There is a branch of complexity theory where real physical magnitudes (time, space) are considered, but this is not the kind of consideration that Chomsky has generally thought relevant. Thus, there is a gap that needs more than rhetorical filling: what links the computational intuitions with physical magnitudes?

Fourth, how good are the motivating assumptions provided by the great leap forward? The argument is built by assuming that Merge is what gets the great leap forward leaping. In other words, the cultural artifacts that are proxy for the time when the “slight rewiring” that afforded Merge that allowed for FL and NLGs. Thus the recent sudden dating of the great leap forward are the main evidence for dating the slight change. But why assume that the proximate cause of the leap is a rewiring relevant to Merge, rather than say, the rewiring that licenses externalization of the Mergish thoughts so that they can be communicated. 

Let me put this another way. I have no problem believing that the small rewiring can stand independent of externalization and be of biological benefit. But even if one believes this, it may be that large scale cultural artifacts are the product of not just the rewiring but the capacity to culturally “evolve” and models of cultural evolution generally have communicative language as the necessary medium for cultural evolution. So, the great leap forward might be less a proxy for Merge than it is of whatever allowed for the externalization of FL formed thoughts. If this is so, then it is not clear that the sudden emergence of cultural artifacts shows that Merge is relatively recent. It shows, rather, that whatever drove rapid cultural change is relatively recent, and this might not be Merge per se but the processes that allowed for the externalization of merge generated structures.

So how good is the whole argument? Well let’s say that I am not that convinced. However, I admire it for it tries to do something really interesting. It tries to explain why Merge is simple in a perfectly natural sense of the word.  So let me end with this.

Chomsky has made a decent case that Merge is simple in that it involves no-tampering, a very simple “conjoining” operation resulting in hierarchical sets of unbounded size and that has other nice properties (e.g. displacement, structure dependence). I think that Chomsky’s case for such a Merge operation is pretty nice (not perfect, but not at all bad). What I am far less sure of is that it is possible to take the next step fruitfully: explain why Merge has these properties and not others.  This is the aim of Chomsky’s very ambitious argument here. Does it work? I don’t see it (yet). Is it interesting? Yup! Vintage Chomsky.

[1] All of this can be given a Bayesian justification as well (which is what lies behind derivations of the subset principle in Bayes accounts) but I like my little analogy so I leave it to the sophisticates to court the stately Reverend.
[2] Before proceeding it is worth noting that Chomsky’s argument is not just a matter of axiom counting as in the simple analogy above. It involves more recondite conceptions of the “simplicity” of one’s assumptions. Thus even if the number of assumptions is the same it can still be that some assumptions are simpler than others (e.g. the assumption that a relation is linear is “simpler” than that a relation is quadratic). Making these arguments precise is not trivial. I will return to them below.
[3] So does the fact that FL has been basically stable in the species ever since it emerged (or at least since humans separated). Note, the fact that FL did not continue to evolve after the trek out of Africa also suggests that the “simple” change delivered more or less all of what we think of as FL today. So, it’s not like FLs differ wrt Binding Principles or Control theory but are similar as regards displacement and movement locality. FL comes as a bundle and this bundle is available to any kid learning any language.
[4] Let me fess up: this is WAY beyond my understanding.
[5] What do snowflakes optimize? The following see here, my emphasis [NH]):

The growth of snowflakes (or of any substance changing from a liquid to a solid state) is known as crystallization. During this process, the molecules (in this case, water molecules) align themselves to maximize attractive forces and minimize repulsive ones. As a result, the water molecules arrange themselves in predetermined spaces and in a specific arrangement. This process is much like tiling a floor in accordance with a specific pattern: once the pattern is chosen and the first tiles are placed, then all the other tiles must go in predetermined spaces in order to maintain the pattern of symmetry. Water molecules simply arrange themselves to fit the spaces and maintain symmetry; in this way, the different arms of the snowflake are formed.

[6] Shameless plug: this is what I try to do here, though strictly speaking concatenation here is not among objects in a 2-space but a 3-space (hence results in “concatenated” objects with no linear implications.