Friday, June 26, 2015

Aspects at 50

Here is a piece (here) by Geoff Pullum (GP) celebrating the 50th anniversary of Aspects. This is a nice little post. GP has a second post that he mentions in the one linked to on the competence/performance distinction. I’ll put up a companion piece to this second post anon. Here are a few comments on GP’s Aspects post.

Here GP gives a summary that feels right to me (i.e. my recollections match GP’s) about the impact that Aspects had on those that first read it. Reading chapter 1 felt revelatory,  like a whole new world opening up. The links that it forged between broad issues in philosophy (I was an undergrad in philosophy when I first read it) and central questions in cognition and computation were electrifying. Everyone in cognition broadly construed (and I do mean everyone: CSers, psychologists, philosophers) read Aspects and believed that they had to read it. Part of this may have been due to some terminological choices that Chomsky came to regret (or so it I believe). For example, replacing the notion “kernel sentence” with the notion “deep structure” led people to think, as GP put it:

Linguistics isn’t a matter of classifying parts of sentences anymore; it was about discovering something deep, surprising and hidden.

But this was not the reason for its impact. The reason Aspects was a go-to text was that chapter 1 was (and still is) a seminal document of the Cognitive Revolution and the study of mind. It is still the best single place to look if one is interested in how the study of language can reveal surprising, non-trivial features about human minds. So perhaps there is something right about the deep in Deep Structure. Here’s what I mean.

I believe that Chomsky was in a way correct in his choice of nomenclature. Though Deep Structure itself was/is not particularly “deep,” understanding the aim of syntax as that which maps between phrase markers that represent meaning-ish information (roughly thematic information, which, recall, was coded at Deep Structure)[1] with structures that feed phonetic expression is deep. Why? Because, such a mapping is not surface evident and it involves rules and abstract structure with their own distinctive properties.  Aspects clarifies what is more implicit in Syntactic Structures (and LSLT, which was not then widely available); namely that syntax manipulates abstract structures (phrase markers). In particular, in contrast to Harris, who understood Transformations as mapping sentences (actually items in a corpus (viz. utterances)) to sentences, Aspects makes clear this is not the right way to understand transformations or Gs. The latter map phrase markers to other phrase markers and eventually to representations of sound and meaning. They may map relations between sentences, but only very indirectly. And this is a very big difference in the conception of what a G is and what a transformation is, and it all arises in virtue of specifying what a Deep Structure is. In particular, whereas utterances are plausibly observable, the rules that do the mappings that Chomsky envisaged are not. Thus, what Aspects did was pronounce that the first object of linguistic study is not what you see and hear but the rules, the Gs that mediate two “observables”: what a sentence means and how it is pronounced. This was a real big deal, and it remains a big deal (once again, reflect on the difference between Greenberg and Chomsky Universals). As GP said above, Deep Structure moves us from meditating on sentences (actually, utterances or items in corpora) to thinking about G mappings.

Once one thinks of things in this way, then the rest of the GG program follows pretty quickly: What properties do Gs have in common? How are Gs acquired on the basis of the slim evidence available to the child? How are Gs used in linguistic behavior? How did the capacity to form Gs arise in the species? What must G capable brains be like to house Gs and FL/UGs? In other words, once Gs become the focus of investigation, then the rest of the GG program comes quickly into focus. IMO, it is impossible to understand the Generative Program without understanding chapter 1 of Aspects and how it reorients attention to Gs and away from, as GP put, “classifying parts of sentences.”

GP also points out that much of the details that Aspects laid out have been replaced with other ideas and technology. There is more than a little truth to this. Most importantly, in retrospect, Aspects technology has been replaced by technicalia more reminiscent of the the Syntactic Structures (SS)-LSLT era. Most particularly, we (i.e. minimalists) have abandoned Deep Structure as a level. How so?

Deep Structure in Aspects is the locus of G recursion (via PS rules) and the locus of interface with the thematic system. Transformations did not create larger phrase markers, but mapped these Deep Structure PMs into others of roughly equal depth and length.[2] In more contemporary minimalist theories, we have returned to the earlier idea that recursion is not restricted to one level (the base), but is a function of the rules that work both to form phrases (as PS rules did in forming Deep Structure PMS) and transform them (e.g. as movement operations did in Aspects). Indeed, Minimalism has gone onse step further. The contemporary conceit denies that there is a fundamental distinction between G operations that form constituents/units and those that displace expressions from one position in a PM to another (i.e. the distinction between PS rules and Transformations). That’s the big idea behind Chomsky’s modern conception of Merge, and it is importantly different from every earlier conception of G within Generative Grammar. Just as LGB removed constructions as central Gish objects, minimalism removed the PS/Transformation rule distinction as a fundamental grammatical difference. In a merge based theory there is only one recursive rule and both its instances (viz. E and I merge) build bigger and bigger structures.[3]  

Crucially (see note 3), this conception of structure building also effectively eliminates lexical insertion as a distinct G operation, one, incidentally, that absorbed quite a bit of ink in Aspects. However, it appears to me that this latter operation may be making a comeback. To the degree that I understand it, the DM idea that there is late lexical insertion comes close to revitalizing this central Aspects operation. In particular, on the DM conception, it looks like Merge is understood to create grammatical slots into which contents are later inserted. This distinction between an atom and the slot that it fills is foreign to the original Merge idea. However, I may be wrong about this, and if so, please let me know. But if so, it is a partial return to ideas central to the Aspects inventory of G operations.[4]

In addition, in most contemporary theories, there are two other lasting residues of the Aspects conception of Deep Structure. First, Deep Structure in Aspects is the level where thematic information meets the G. This relation is established exclusively by PS rules. This idea is still widely adopted and travels under the guise of the assumption that only E-Merge can discharge thematic information (related to the duality of interpretation assumption). This assumption regarding a “residue” of Deep Structure, is the point of contention between those that debate whether movement into theta positions is possible (e.g. I buy it, Chomsky doesn’t). [5] Thus, in one sense, despite the “elimination” of DS as a central minimalist trope, there remains a significant residue that distinguishes those operations that establish theta structure in the grammar and those that transform these structures to establish the long distance displacement operations that are linguistically ubiquitous.[6]

Second, all agree that theta domains are the smallest (i.e. most deeply embedded) G domains. Thus, an expression discharges its thematic obligations before it does anything else (e.g. case, agreement, criterial checking etc.). This again reflects the Aspects idea that Deep Structures are inputs to the transformational component. This assumption is still with us; despite the “elimination” of Deep Structure. We (and here I mean everyone, regardless of whether you believe that movement to a theta position is licit) still assume that a DP E-merges into a theta position before it I-merges anywhere else, and this has the consequence that the deepest layer of the grammatical onion is the theta domain. So far as I know, this assumption is axiomatic. In fact, why exactly sentences are organized so that the theta domain is embedded in case/agreement domain, which is in turn embedded in A’-domains is entirely unknown.[7]

In short, Deep Structure, or at least some shadowy residue, is still with us, though in a slightly different technical form. We have abandoned the view that all thematic information is discharged before any transformation can apply. But we have retained the idea that for any given “argument” its thematic information is discharged before any transformation applies to it, and most have further retained the assumption that movement into theta positions is illicit. This is pure Deep Structure.

Let me end by echoing GP’s main point. Aspects really is an amazing book, especially chapter 1. I still find it inspirational and every time I read it I find something new. GP is right to wonder why there haven’t been countless celebrations of the book. I would love to say that it’s because its basic insights have been fully absorbed into linguistics, and the wider cognitive study of language. It hasn’t. It’s still, sadly, a revolutionary book. Read it again and see for yourself.

[1] Indeed, given the Katz-Postal hypothesis all semantic information was coded at Deep Structure. As you all know, this idea was revised in the early 70s with both Deep Structure and Surface Structure contributing to interpretation. Thematic information still coded in the first and scope information and binding and other semantic effects coded in the second. This led to a rebranding, with Deep Structure giving way to D-Structure. This more semantically restricted level was part of every subsequent mainstream generative G until the more contemporary Minimalist period. And, as you will see below, it still largely survives in modified form in thoroughly modern minimalist grammars.
[2] “Roughly” because there were pruning rules that made PMs smaller, but none that made them appreciably bigger.
[3] In earlier theories PS rules built structures that lexical insertion and movement operations filled. The critical feature of Merge that makes all its particular applications structure building operations is the elimination of the distinction between an expression and the slot it occupies. Merge does not first form a slot and then fill it. Rather expressions are combined directly without rules that first form positions into which they are inserted.
[4] Interestingly, this makes “traces” as understood within GB undefinable, and this makes both the notion of trace and that of PRO unavailable in a Merge based theory. As the rabbis of yore were found of mumbling: Those who understand will understand.
[5] Why only E-merge can do so despite the unification of E and I merge is one of the wedges people like me use to conceptually motivate the movement theory of control/anaphora. Put another way, it is only via (sometimes roundabout) stipulation that a minimalist G sans Deep Structure can restrict thematic discharge to E-merge.
[6] In other words, contrary to much theoretical advertising, DS has not been entirelyeliminated in most theories, though one central feature ahs been dispensed with.
[7] So far as I know, the why question is rarely asked. See here for some discussion and a link to a paper that poses and addresses the issue.


  1. Interesting that you liken Minimalism to Syntactic Structures rather than Aspects. One could just as well analyze Minimalism as a return to Aspects: Deep Structure is furnished by derivation trees, which can be described in terms of context-free phrase structure grammars, and derivation trees are mapped to phrase structure trees that are only linearly bigger (unless you have overt copying). There is also a difference between Merge and Move in that the mapping brings about major structural changes for the former but not the latter.

    I like this perspective because it highlights that we have made a lot of progress in characterizing this mapping. Peters & Ritchie showed that Aspects didn't have a good handle of that, that's why you got the Turing equivalence with even very harsh restrictions on D-Structure. They also pointed out in a follow-up paper (which seems to have been ignored at the time of publishing) that bounding the mapping with respect to the size of D-Structure lowers expressivity quite a bit. What you get is a non-standard class that generates

    1) all context-free languages,
    2) some but not all context-sensitive languages,
    3) some (properly) recursively enumerable languages.

    In hindsight, we can recognize this as a first rough approximation of the mildly context-sensitive language. So Aspects was on the right track, but the mapping was still too powerful --- and also too complicated, which is why the Peters and Ritchie proofs are pretty convoluted. Unifying all transformations in terms of a few strongly restricted movement operations (and doing away with deletion) has really cleared things up.

    Just to be clear, I'm not saying that your characterization is less adequate. Rather, this is a nice demonstration that one and the same piece of technical machinery can be conceptualized in various ways to highlight different aspects.

  2. @Thomas Graf What's the reference for the follow-up P&R paper?

    1. @INCOLLECTION{PetersRitchie73a,
      author = {Peters, Stanley and Ritchie, Robert W.},
      title = {Non-Filtering and Local-Filtering Transformational Grammars},
      year = {1973},
      editor = {Hintikka, Jaakko and Moravcsik, J.M.E. and Suppes, Patrick},
      booktitle = {Approaches to Natural Language},
      publisher = {Reidel},
      address = {Dordrecht},
      pages = {180--194}

      Let me flesh out the technical side a bit. P&R show that a transformational grammar restricted to context-free D-structures and local-filtering transformations is rather peculiar with respect to weak generative capacity. The claims I made above are established as follows:

      1) The fact that every context-free language can be generated is an immediate consequence of the context-freeness of D-structures.
      2) These restricted transformational grammars cannot generate the language a^(2^(2^n)), which is context-sensitive.
      3) P&R show that every recursively enumerable language is the intersection of some transformational language with a regular language. But since the next weaker class --- the class of recursive languages --- is closed under intersection with regular languages, the previous result can hold only if transformational grammars generates some non-recursive (and thus recursively enumerable) languages.

      One minor correction to what I said in point 2 above: the paper does not show that some properly context-sensitive languages are generated by this formalism. It is in principle possible that expressivity jumps immediately from context-free all the way up to a specific subset of the recursively enumerable languages. That said, I am pretty sure that context-sensitive languages like a^n b^n c^n or a^(2^n) can be generated by the formalism, though I haven't worked out specific transformational grammars for these languages.