Friday, December 2, 2016

What's a minimalist analysis

The proliferation of handbooks on linguistics identifies a gap in the field. There are so many now that there is an obvious need for a handbook of handbooks consisting of papers that are summaries of the various handbook summaries. And once we take this first tiny recursive step, as you all know, sky’s the limit.

You may be wondering why this thought crossed my mind. Well, it’s because I’ve been reading some handbook papers recently and many of those that take a historical trajectory through the material often have a penultimate section (before a rousing summary conclusion) with the latest minimalist take on the relevant subject matter. So, we go through the Standard Theory version of X, the Extended Standard Theory version, the GB version and finally an early minimalist and late minimalist version of X. This has naturally led me to think about the following question: what makes an analysis minimalist? When is an analysis minimalist and when not? And why should one care?

Before starting let me immediately caveat this. Being true is the greatest virtue an analysis can have. And being minimalist does not imply that an analysis is true. So not being minimalist is not in itself necessarily a criticism of any given proposal. Or at least not a decisive one. However, it is, IMO, a legit question to ask of a given proposal whether and how it is minimalist. Why? Well because I believe that Darwin’s Problem (and the simplicity metrics it favors) is well-posed (albeit fuzzy in places) and therefore that proposals dressed in assumptions that successfully address it gain empirical credibility. So, being minimalist is a virtue and suggestive of truth, even if not its guarantor.[1]

Perhaps I should add that I don’t think that anything guarantees truth in the empirical sciences and that I also tend to think that truth is the kind of virtue that one only gains slantwise. What I mean by this is that it is the kind of goal one attains indirectly rather than head on. True accounts are ones that economically cover reasonable data in interesting ways, shed light on fundamental questions and open up new avenues for further research.[2] If a story does all of that pretty well then we conclude it is true (or well on its way to it). In this way truth is to theory what happiness is to life plans. If you aim for it directly, you are unlikely to get it. Sort of like trying to fall asleep. As insomniacs will tell you, that doesn’t work.

That out of the way, what are the signs of a minimalist analysis (MA)? We can identify various grades of minimalist commitment.

The shallowest is technological minimalism. On this conception an MA is minimalist because it expresses its findings in terms of ‘I-merge’ rather than ‘move,’ ‘phases’ rather than ‘bounding nodes’/‘barriers,’ or ‘Agree’ rather than ‘binding.’ There is nothing wrong with this. But depending on the details there need not be much that is distinctively minimalist here. So, for example, there are versions of phase theory (so far as I can tell, most versions) that are isomorphic to previous GB theories of subjacency, modulo the addition of v as a bounding node (though see Barriers). The second version of the PIC (i.e. where Spell Out is delayed to the next phase) is virtually identical to 1-subjacency and the number of available phase edges is identical to the specification of “escape hatches.”

Similarly for many Agree based theories of anaphora and/or control. In place of local coindexing we express the identical dependency in terms of Agree in probe/goal configurations (antecedents as probes, anaphors as goals)[3] subject to some conception of locality. There are differences, of course, but largely the analyses inter-translate and the novel nomenclature serves to mask the continuity with prior analyses of the proposed account. In other words, what makes such analyses minimalist is less a grounding in basic features of the minimalist program, then a technical isomorphism between current and earlier technology. Or, to put this another way, when successful, such stories tell us that our earlier GB accounts were no less minimalist than our contemporary ones. Or, to put this yet another way, our current understanding is no less adequate than our earlier understanding (i.e. we’ve lost nothing by going minimalist). This is nice to know, but given that we thought that GB left Darwin’s Problem (DP) relatively intact (this being the main original motivation for going Minimalist (i.e. beyond explanatory adequacy) then analyses that are effectively the same as earlier GB analyses likely leave DP in the same opaque state. Does this mean that translating earlier proposals into current idiom is useless? No. But such translations often make a modest contribution to the program as a whole given the suppleness of current technology.

There is a second more interesting kind of MA. It starts from one of the main research projects that minimalism motivates. Let’s call this “reductive” or “unificational minimalism” (UM). Here’s what I mean.

The minimalist program (MP) starts from the observation that FL is a fairly recent cognitive novelty and thus what is linguistically proprietary is likely to be quite meager. This suggests that most of FL is cognitively or computationally general, with only a small linguistically specific residue. This suggests a research program given a GB backdrop (see here for discussion). Take the GB theory of FL/UG to provide a decent effective theory (i.e. descriptively pretty good but not fundamental) and try to find a more fundamental one that has these GB principles as consequences.[4] This conception provides a two pronged research program: (i) eliminate the internal modularity of GB (i.e. show that the various GB modules are all instances of the same principles and operations (see here)) and (ii) show that of the operations and principles that are required to effect the unification in (i), all save one are cognitively and/or computationally generic. If we can successfully realize this research project then we have a potential answer to DP: FL arose with the adventitious addition of the linguistically proprietary operation/principle to the cognitive/computational apparatus the species antecedently had.

That’s the main contours of the research program. UF concentrates on (i) and aims to reduce the different principles and operations within FL to the absolute minimum. It does this by proposing to unify domains that appear disparate on the surface and by reducing G options to an absolute minimum.[5] A reasonable heuristic for this kind of MA is the idea that Gs never do things in more than one way (e.g. there are not two ways (viz. via matching or raising) to form relative clauses). This is not to deny different surface patterns obtain, only that they are not the products of distinctive operations.

Let me put this another way: UM takes the GB disavowal of constructions to the limit. GB eschewed constructions in that it eliminated rules like Relativization and Topicalization, seeing both as instances of movement. However, it did not fully eliminate constructions for it proposed very different basic operations for (apparently) different kinds of dependencies. Thus, GB distinguishes movement from construal and binding from control and case assignment from theta checking. In fact, each of the modules is defined in terms of proprietary primitives, operations and constraints. This is to treat the modules as constructions. One way of understanding UM is that it is radically anti-constructivist and recognizes that all G dependencies are effected in the same way. There is, grammatically speaking, only ever one road to Rome.

Some of the central results of MP are of this ilk. So, for example, Chomsky’s conception of Merge unifies phrase structure theory and movement theory. The theory of case assignment in the Black Book unifies case theory and movement theory (the latter being just a specific reflex of movement) in much the way that move alpha unifies question formation, relativization, topicalization etc. The movement theory of control and binding unifies both modules with movement. The overall picture then is one in which binding, structure building, case licensing, movement, and control “reduce” to a single computational basis. There aren’t movement rules versus phrase structure rules versus binding rules versus control rules versus case assignment rules. Rather these are all different reflexes of a single Merge effected dependency with different features being licensed via the same operation. It is the logic of On wh movement writ large.

There are other examples of the same “less is more” logic: The elimination of D-structure and S-structure in the Black Book, Sportiche’s recent proposals to unify promotion and matching analyses of relativization, unifying reconstruction and movement via the copy theory of movement (in turn based on a set theoretic conception of Merge), Nunes theory of parasitic gaps, and Sportiche’s proposed elimination of late merger to name five. All of these are MAs in the specific sense that they aim to show that rich empirical coverage is compatible with a reduced inventory of basic operations and principles and that the architecture of FL as envisioned in GB can be simplified and unified thereby advancing the idea that a (one!) small change to the cognitive economy of our ancestors could have led to the emergence of an FL like the one that we have good (GB) evidence to think is ours.  Thus, MAs of the UM variety clearly provide potential answers to the core minimalist DP question and hence deserve their ‘minimalist’ modifier.

The minimalist ambitions can be greater still. MAs have two related yet distinct goals. The first is to show that svelter Gs do no worse than the more complex ones that they replace (or at least don’t do much worse).[6] The second is to show that they do better. Chomsky contrasted these in chapter three of the Black Book and provided examples illustrating how doing less with more might be possible. I would like to mention a few by way of illustration, after a brief running start.

Chomsky made two methodological observations. First, if a svelter account does (nearly) empirically as well as a grosser one then it “wins” given MP desiderata. We noted why this was so above regarding DP, but really nobody considers Chomsky’s scoring controversial given that it is a lead footed application of Ockham. Fewer assumptions are always better than more for the simple reason that for a given empirical payoff K an explanation based on N assumptions leaves each assumption with greater empirical justification than one based on N+1 assumptions. Of course, things are hardly ever this clean, but often they are clean enough and the principle is not really contestable.[7]

However, Chomsky’s point extends this reasoning beyond simple assumption counting. For MP it’s not only the number of assumptions that matter but their pedigree. Here’s what I mean.  Let’s distinguish FL from UG. Let ‘FL’ designate whatever allows the LAD to acquire a particular GL based on PLDL. Let ‘UG’ designate those features of FL that are linguistically proprietary (i.e. not reflexes of more generic cognitive or computational operations). A MA aims to reduce the UG part of FL. In the best case, it contains a single linguistically specific novelty.[8] So, it is not just a matter of counting assumptions. Rather what matters is counting UG (i.e. linguistically proprietary) assumptions. We prefer those FLs with minimal UGs and minimal language specific assumptions.

An example of this is Chomsky’s arguments against D-structure and S-structure as internal levels. Chomsky does not deny that Gs interface with interpretive interfaces, rather he objects to treating these as having linguistically special properties.[9] Of course, Gs interface with sound and meaning. That’s obvious (i.e. “conceptually necessary”). But this assumption does not imply that there need be anything linguistically special about the G levels that do the interfacing beyond the fact that they must be readable by these interfaces. So, any assumption that goes beyond this (e.g. the theta criterion) needs defending because it requires encumbering FL with UG strictures that specify the extras required. 

All of this is old hat, and, IMO, perfectly straightforward and reasonable. But it points to another kind of MA: one that does not reduce the number of assumptions required for a particular analysis, but that reapportions the assumptions between UGish ones and generic cognitive-computational ones. Again, Chomsky’s discussions in chapter 3 of the Black Book provide nice examples of this kind of reasoning, as does the computational motivation for phases and Spell Out.

Let me add one more (and this will involve some self referentiality). One argument against PRO based conceptions of (obligatory) control is that they require a linguistically “special” account of the properties of PRO. After all, to get the trains to run on time PRO must be packed with features which force it to be subject to the G constraints it is subject to (PRO needs to be locally minimally bound, occurs largely in non-finite subject positions, and  has very distinctive interpretive properties). In other words, PRO is a G internal formative with special G sensitive features (often of the possibly unspecified phi-varierty) that force it into G relations. Thus, it is MP problematic.[10] Thus a proposal that eschews PRO is prima facie an MA story of control for it dispenses with the requirement that there exists a G internal formative with linguistically specific requirements.[11] I would like to add, precisely because I have had skin in this game, that this does not imply that PRO-less accounts of control are correct or even superior to PRO based conceptions.  No! But it does mean that eschewing PRO has minimalist advantages over accounts that adopt PRO as they minimize the UG aspects of FL when it comes to control.

Ok, enough self-promotion.  Back to the main point. The point is not merely to count assumptions but to minimize UGish ones. In this sense, MAs aim to satisfy Darwin more than Ockham. A good MA minimizes UG assumptions and does (about) as well empirically as more UG encumbered alternatives. A good sign that a paper is providing an MA of this sort, is manifest concern to minimize the UG nature of the principles assumed.

Let’s now turn to (and end with) the last most ambitious MA: it is one that not merely does (almost) as well as more UG encumbered accounts, but does better. How can one do better. Recall that we should expect MAs to be more empirically brittle than less minimalist alternatives given that MP assumptions generally restrict an account’s descriptive apparatus.[12]  So, how can a svelter account do better? It does so by having more explanatory oomph (see here). Here’s what I mean.

Again, the Black Book provides some examples.[13] Recall Chomsky’s discussion of examples like (1) with structures like (2):

(1)  John wonders how many pictures of himself Frank took
(2)  John wonders [[how many pictures of himself] Frank took [how many pictures of himself]]

The observation is that (1) has an idiomatic reading just in case Frank is the antecedent of the reflexive.[14] This can be explained if we assume that there is no D-structure level or S-structure level. Without these binding and idiom interpretation must be defined over that G level that is input to the CI interface. In other words, idiom interpretation and binding are computed over the same representation and we thus expect that the requirements of each will affect the possibilities of the other.

More concretely, to get the idiomatic reading of take pictures requires using the lower copy of the wh phrase. To get the John as potential antecedent of the reflexive requires using the higher copy. If we assume that only a single copy can be retained on the mapping to CI, this implies that if take pictures of himself is understood idiomatically, Frank is the only available local antecedent of the reflexive. The prediction relies on the assumption that idiom interpretation and binding exploit the same representation. Thus, by eliminating D-structure, the theory can no longer make D-structure the locus of idiom interpretation and by eliminating S-structure, the theory cannot make it the locus of binding. Thus by eliminating both levels the proposal predicts a correlation between idiomaticity and reflexive antecedence.

It is important to note that a GBish theory where idioms are licensed at D-structure and reflexives are licensed at S-structure (or later) is compatible with Chomsky’s reported data, but does not predict it. The relevant data can be tracked in a theory with the two internal levels. What is missing is the prediction that they must swing together. In other words, the MP story explains what the non-MP story must stipulate. Hence, the explanatory oomph. One gets more explanation with less G internal apparatus.

There are other examples of this kind of reasoning, but not that many.  One of the reasons I have always liked Nunes’ theory of parasitic gaps is that it explains why they are licensed only in overt syntax. One of the reasons that I like the Movement Theory of Control is that it explains why one finds (OC) PRO in the subject position of non-finite clauses. No stipulations necessary, no ad hoc assumptions concerning flavors of case, no simple (but honest) stipulations restricting PRO to such positions. These are minimalist in a strong sense.

Let’s end here. I have tried to identify three kinds of MAs. What makes proposals minimalist is that they either answer or serve as steps towards answering the big minimalist question: why do we have the FL we have? How did FL arise in the species?  That’s the question of interest. It’s not the only question of interest, but it is an important one. Precisely because the question is interesting it is worth identifying whether and in what respects a given proposal might be minimalist. Wouldn’t it be nice if papers in minimalist syntax regularly identified their minimalist assumptions so that we could not not only appreciate their empirical virtuosity, but could also evaluate their contributions to the programmatic goals.

[1] If pressed (even slightly) I might go further and admit that being minimalist is a necessary condition of being true. This follows if you agree that the minimalist characterization of DP in the domain of language is roughly accurate. If so, then true proposals will be minimalist for only such proposals will be compatible with the facts concerning the emergence of FL. That’s what I would argue, if pressed.
[2] And if this is so, then the way one arrives at truth in linguistics will plausibly go hand in hand with providing answers to fundamental problems like DP. This, proposals that are minimalist may thereby have a leg up on truth. But, again, I wouldn’t say this unless pressed.
[3] The agree dependency here established accompanied by a specific rule of interpretation whereby agreement signals co-valuation of some sort. This, btw, is not a trivial extra.
[4] This parallels the logic of On wh movement wrt islands and bounding theory. See here for discussion.
[5] Sportiche (here) describes this as eliminating extrinsic theoretical “enrichments” (i.e. theoretical additions motivated entirely by empirical demands).
[6] Note a priori one expects simpler proposals to be empirically less agile than more complex ones and to therefore cover less data. Thus, if a cut down account gets roughly the same coverage this is a big win for the more modest proposal.
[7] Indeed, it is often hard to individuate assumptions, especially given different theoretical starting points. However (IMO surprisingly), this is often doable in practice so I won’t dwell on it here.
[8] I personally don’t believe that it can contain less for it would make the fact that nothing does language like humans do a complete mystery. This fact strongly implies (IMO) that there is something UGishly special about FL. MP reasoning implies that this UG part is very small, though not null. I assume this here.
[9] That’s how I understand the proposal to eliminate G internal levels.
[10] It is worth noting that this is why PRO in earlier theories was not a lexical formative at all, but the residue of the operation of the grammar. This is discussed in the last chapter here if you are interested in the details.
[11] One more observation: this holds even if the proposed properties of PRO are universal, i.e. part of UG. The problem is not variability but linguistic specificity.
[12] Observe that empirical brittleness is the flip side of theoretical tightness. We want empirically brittle theories.
[13] The distinction between these two kinds of MAs is not original with me but clearly traces to the discussion in the Black Book.
[14] I report the argument. I confess that I do not personally get the judgments described. However, this does not matter for purposes of illustration of the logic.


  1. cool minimalist analysis,you should write about error analysis too

  2. A good book on Minimalism is Rouveret's Arguments Minimalistes (2015, ENS éditions).It is amazing to read it.

  3. A good book on Minimalism is Rouveret's Arguments Minimalistes (2015, ENS éditions).It is amazing to read it.