Sunday, August 24, 2014

The Cake is a Lie

In the comments section to Norbert's final remarks on Chomsky's lecture series, Omer Preminger laments the resurrection of representational devices in Minimalism, which started with Chomsky (2004) "Beyond Explanatory Adequacy". I originally took Omer to argue against representational approaches on two levels:
  1. They are fundamentally flawed regarding both competence (bad empirical coverage) and performance (no plausible parsing model).
  2. Phase-theory is ill-motivated and doesn't get the job done.
After a short email conversation with Omer it actually became clear that this is not an accurate reflection of his views, which he will summarize in a guest post within the next few days summarizes in this follow-up post. Still, now that we've got those two claims laid out in front of us, let's assess their soundness.

I've got no qualms with the second claim. The motivation and empirical use of phases has frequently been criticized in the literature, and phases don't fare any better from a computational perspective. The memory reduction argument for phases is hogwash, phases have no discernible effect on generative capacity (neither weak nor strong), and they do not simplify the learning problem. Norbert captures the state of affairs ever so succinctly: Phases don't walk the walk [...] the rhetoric is often ahead of the results.

The first claim, on the other hand, I can't get behind at all. For one thing, it's a typical case of the cake fallacy: Every cake I've made in my life has been horrible, thus all cakes are horrible (and I'm more of a muffins guy anyways). Even if linguists haven't come up with any good representational models so far --- which I think many syntacticians would emphatically disagree with --- that doesn't mean the approach as such is intrinsically inferior to derivational ones. Now I can already see the flurry of posts about how theory construction is also a probabilistic process where previous failures of specific types of analysis make them less preferable, that the existence of a working representational account is moot if we can't find it, yada yada yada. Step back from your keyboards everyone, I'm actually not interested in arguing this point. My real worry is much more basic. The first claim implies, as is commonly done in linguistics, that there is a meaningful difference between representational and derivational accounts. Well, it turns out that this specific cake is a lie.

Representations and Derivations

Those of you with an eidetic memory might remember from an earlier discussion that I have always been flummoxed by the distinction between derivational and representational theories of syntax. Actually, that's not quite true. Back in my undergrad days, when I gravitated towards syntax after being thoroughly disappointed by computational linguistics (but that's a different story), the distinction made perfect sense to me. A representational theory's primary modus operandi is well-formedness conditions on trees, whereas a derivational theory is about the operations that build these trees.

Now whenever you have competing alternatives, it makes sense to test whether they can be distinguished empirically, and if so, whether one is superior to the other. The literature has its fair share of arguments for either framework, some of them conceptual in nature (Brody's Representational Minimalism VS Strict Derivational Minimalism), some of them empirical (for instance in this handout by David Pesetsky). But the thing is, now that I think about linguistic formalisms a lot more abstractly, these arguments do not make any sense to me anymore. There seem to be some implicit assumptions about what counts as representational or derivational that I have huge troubles putting into technical terms, not because I'm incompetent, but because they are arbitrary and inconsistent.

Rabbit Hole 1: Refined Representations

Representations Cannot Handle Timing

Empirical arguments against representational theories usually rely on timing: in order to determine the well-formedness of a given structure, it's not enough to know what moved where, we also need to know in which order things moved. Here's a quick example that takes an early version of Eric Reuland's theory of binding as inspiration (sorry about the paywall). In this system, syntactic binding is tantamount to copying of the antecedent's phi-features to the anaphor under Spec-Head agreement. Since anaphors usually do not start out in such a local configuration, they must undergo covert movement to the head that the antecedent is a specifier of. Here's a simplified example of what the structure of John likes himself might look like if this analysis is combined with a trace-based implementation of movement.

The subject has moved from its vP-internal base position to Spec,TP, while the anaphor DP undergoes phrasal covert movement to Spec,vP for case checking purposes, which is subsequently followed by head movement of the anaphor to T. Now the subject and the anaphor are in Spec-Head configuration and syntactic binding can be instantiated via feature copying. Somewhat baroque, but easy enough to regulate by representational means.

However, things get trickier when the antecedent moves on after feature copying has taken place. Consider the sentence Whoi does Mary think that Sue introduced ti to himself. Let's assume for the sake of argument that who first moves to Spec,vP to get its case checked, then to the specifier of the embedded CP, and finally to the specifier of the matrix CP. The reflexive himself, on the other hand, only undergoes covert head movement to v so that binding can be established (let's not worry about whether this is a licit instance of movement in Reuland's system). So now the final structure is more complicated.

Note that the antecedent and the reflexive are in a Spec-Head configuration only by proxy thanks to the trace left behind by the antecedent. So now we have a problem: unless the anaphor moved to v before its antecedent moved out of Spec,vP, the two were never in a Spec-Head configuration and binding should not be possible. In a derivational framework this is unproblematic because we know the respective timing of operations. In a representational framework, though, we can handle such scenarios only if we can infer the "order" of movement steps from independent factors, e.g. something like the Strict Cycle Condition. Order is in quotation marks here because this is exactly where things get interesting.

Reinterpreting Timing

The obvious fix to the timing problem is to mark up representations in a way that encodes the timing of operations. So rather than the tree above, we could have the enriched version below where every node has a numeric index that shows at which step of the derivation it was created.

The obvious complaint about the obvious fix is that this is just window dressing: we've diluted our representational framework by bringing in the derivational concept of timing. Except that, strictly speaking, we haven't. What we have done is introduce another type of index and, presumably, a constraint that regulates the distribution of these indices. This is no different from the index-based approach to binding theory. So why is one inherently derivational, and the other one is not? It can't be about the technical devices themselves, because those are exactly the same. The difference is only in our interpretation of these indices. But judging a technical device by our interpretation of it is rather peculiar because, in general, there's no bounds on how one may interpret some formal piece of machinery.

String automata, for instance, can be viewed as producing output strings or accepting input strings, it makes not difference. Quantum theory has the Copenhagen interpretation based on probability distributions as well as the possible world interpretation, and both are equally compatible with the math. Modal logic is in no way tied to a possible worlds interpretation, we might just as well be talking about knowledge states, configurations of operating systems, and so on. Quite simply, the fact that the new indices have a natural interpretation in terms of derivational timing does not make them derivational (for the sake of my own sanity I won't even try to fathom what makes an interpretation natural or unnatural).

And we're just talking about one particular indexation scheme here. It's a rather simple exercise to cook up an equivalent encoding method that cannot be linked to derivational timing in a straightforward manner. So do we want to say that this alternative encoding wouldn't derivationalize our representational theory because it does not readily allow for a derivational interpretation? This seems ludicrous. The only alternative, then, is to eschew any device that allows for some derivational interpretation, no matter how unnatural. But that's even worse because every representational device can be made derivational --- for every filtration model, you can cook up an equivalent generation model (for instance an Abstract Categorial Grammar). If we were to take such a strong position, everything would ultimately be derivational. Which actually isn't all that far from the truth, except that everything is also representational.

Rabbit Hole 2: Derivations as Representations

In the last twenty years it has become common practice in mathematical linguistics to study derivational formalisms by looking at their derivation trees, i.e. the record of which operations applied in which order to yield the desired output structure. A Minimalist grammar, for instance, can be defined in terms of its derivation trees and a mapping from derivation trees to phrase structure trees. Crucially, this definition involves a set of constraints that separate the well-formed derivations from the ill-formed ones. In other words, what we have is a representational, constraint-based theory of derivations: generate all logically possible derivations, then use constraints X, Y, and Z to filter out those derivations that are deemed ill-formed by the grammar.

This perspective is just as natural as the canonical derivational one, and in many cases it is actually a lot more convenient. My thesis, for instance, would be quite convoluted without the representational view of MGs (I'll just assume that its current incarnation is a beacon of lucidity). With this approach we do not even have to mark up representations or bring in other technical shenanigans, the data structures of the theory stay exactly the same. The only difference is in how we define them.

At this point we can again ask ourselves what this means for the derivational/representational split. MGs are without a doubt derivational, yet we can easily view them from a representational perspective. One might object that this theory is not representational because we are constraining derivation trees rather than the syntactic output structure. But this is once again a matter of how we interpret our formalism, not what the formalism does on a technical level. Phrase structure trees and derivation trees are both trees. That we interpret one as the actual structure of the sentence and one as a record of how the sentence was built is a historical accident. We could just as well treat the derivation trees as the primary data structure of syntax proper, as I argued for in an earlier post, with no negative effects whatsoever.

So now we are at a point where
  1. all derivational theories have a natural representational interpretation, and
  2. every representational theory can be recast in derivational terms.
This doesn't leave any room for axiomatically preferring one over the other.

The General Upshot

I've thought a lot about this supposed divide between representational and derivational approaches over the course of the last few years. I've tried to explore it from a variety of different angles --- generative capacity, succinctness, expressing generalizations, ease of use, psychological plausibility --- but no matter which assumptions and criteria one adopts, by pushing them to their logical conclusion one always ends up with the same picture: the representational/derivational distinction does not exist.

The two terms do not carve out incompatible classes of formalisms that differ in some measurable way. No matter how much you try to bolt down the parameters, e.g. by banning look-ahead, there's always a way out. You might not like this way out, you may consider it a sneaky loop-hole that loses the spirit of being representational or derivational. But since we can't cogently pin down just what that spirit is, such sentiments operate in the realm of aesthetic judgment.

That doesn't mean nothing of use has come from previous discussions of the two approaches; at the very least they unearthed some new data. But so did many technical discussions of the 60s, 70s and 80s that are now considered obsolete. I think it's time to come to terms with the fact that categorizing theories according to their degree of representationality/derivationality is like grouping chemicals according to their taste and smell: a pretty fuzzy affair that captures no chemical properties of interest (unless you're baking a cake).


  1. I think it is pretty much indisputable that there is no formally precise way to separate representational from derivational theories. When Chomsky comments on the representational vs. derivational issue, he usually qualifies his remarks with the observation that it is always possible to convert derivational to representational theories via "coding devices". To the extent that he has tended to favor theories with a derivational component, his argument has been that the derivational theories in question are more “natural” than any of the obvious representational alternatives. So for example, your second tree (or rather, tree derivation) has a nice intuitive interpretation in terms of a timing metaphor, whereas the third tree makes use of a formal device which seems entirely arbitrary. Only on the derivational story do I get a kind of “aha!” moment when I see the explanation for why binding of the reflexive is possible. (This is perhaps because one system of indexation seems as good as any other, whereas restrictions on the order of operations often follow “naturally” once the operations themselves are specified.)

    Of course, it could be argued that we should just ignore hunches about which theories are more “natural”. I don’t think we should ignore them entirely, but I accept that it is hard to justify our reliance on them.

    I suspect we’d both agree that there is not at present much insight to be gained by pitting derivational and representational theories against each other. I find the parsing argument particularly difficult to grasp, since when you’re writing a parser it seems to make no difference at all whether the grammar was stated in derivational or representational terms. BNF notation, for example, can be given either a derivational or representational interpretation. If you’re writing a C compiler, you don’t care if the guys who wrote the BNF grammar in the C standard had a derivational or representational interpretation in mind.

    1. Since Omer's post just went online, I suggest we keep all discussion localized there. My reply to you is also there.