Monday, March 20, 2017

Lexical items and "mere" morphology-1

This was intended to be short. I failed. It is long and will get longer. Here is part 1. Part 2 sometime later this week or next.

As I mentioned in an earlier post, I am in the process of co-editing a volume of commentary essays on Syntactic Structures (SS). The volume is scheduled to be out just in time for the holidays and will, I am sure, make a great gift.  Nothing like an anniversary copy of SS with a compendium of essays elaborating its nuances to while away the holidays. I mention this because the project has got me thinking about how our theories of grammar have changed over time. And this brings me to the topic of today’s question: are all morphemes created equal?

Interestingly, GG theories answer this question differently. SS and Aspects sharply distinguish, at least theoretically, between two kinds of morphemes: those that enter derivations via lexical insertion, and those that enter transformationally. In this way, these theories make a principled distinction between grammatical vs non-grammatical formatives and track their grammatical differences to different G etiologies.

Later theories (take GB as the poster child) distinguish lexical vs functional morphemes, but, and this is important, there appears to be no principled distinction here. The latter more closely track important G features, but both types of formatives enter derivations in the same way (via lexical insertion or heads of X’ projections) and are manipulated by the same kinds of rules. The main difference (which I return to) is that some lexical items require specific grammatical licensing conditions (e.g. reflexives, pronouns, wh-elements) while others don’t (there is no grammatical licensing condition for ‘cat’ or ‘husband’). Functional elements are also often designated “closed class” items, but this classification carries no obvious theoretical import, at least within the theory of grammar. Rather, the designation is descriptive and adverts to the statistical frequency of these elements. Grammatically speaking, it is unclear what makes an expression “functional” beyond the fact that we designate it as such.

Minimalist accounts fall roughly on the GB side of these issues. This, I believe is unfortunate for the earlier distinction between lexical and grammatical formatives is, IMO, worth a modern investigation. Before saying a few words why I believe this, let me indulge my penchant for Whig History and illustrate the distinction contrasting the older Lees-Klima binding theory with the more modern GB view. Readers be warned, this will not be a short excursus.

Let’s start with the Less-Klima (LK) (1963) account.  The theory invokes the following two rules.  They must apply when they can and they are ordered so that (1) applies before (2).
            (1) Reflexivization:
X-NP1- Y- NP2 - Z --> X- NP1-Y- pronoun+self-Z,                               (Where NP1=NP2, pronoun has the f-features of NP2, and NP1/NP2 are in the same simplex sentence and).
(2) Pronominalization:
X-NP1-Y-NP2-Z --> X-NP1-Y- pronoun-Z                                                             (Where NP1=NP2 and pronoun has the f-features of NP2).

As is evident, the two rules have very similar forms. Both apply to identical NPs and morphologically convert one to a reflexive or pronoun. (1), however, only applies to nominals in the same simplex clause, while (2) is not similarly restricted. As (1) obligatorily applies before (2), reflexivization will bleed the environment for the application of pronominalization by changing NP2 to a reflexive (thereby rendering the two NPs non-identical).  The rule ordering effectively derives the complementary distribution of bound pronouns and reflexives. 

An illustration will help make things clear. Consider the derivations of (3a).  It has the underlying structure in (3b). We can factor this as in (3c) as per the reflexivization rule (1). This results in converting (3c) to (3d) with the surface output (3e) carrying a reflexive interpretation.
(3)       a. John1 washed himself/*him
            b. John washed John
            c. X-John-Y-John-Z
            d. X-John-Y-him+self-Z
            e. John washed himself
What blocks John likes him with a similar reflexive reading? To get this structure requires that Pronominalization apply to (3c).  However, it cannot as (1) is ordered to obligatorily apply before (2).  Once (1) applies we get (3d) and this is no longer has a structural description amenable to (2). Thus, the application of (1) bleeds that of (2) and John likes him with a bound reading cannot be derived.

This changes in (4). Reflexivization cannot apply to (4c) as the two Johns are not in the same clause. As (1) cannot apply, (2) can (indeed, must) as it is not similarly restricted to apply to clausemates. In sum, the inability to apply (1) allows the application of (2). Thus does the LK theory derive the complementary distribution of reflexives and bound pronouns.
(4)       a. John believes that Mary washed *himself/him
            b. John believes that Mary washed John
            c. X-John-Y-John
            d. X-John-Y-him
            e. John believes that Mary washed him
There are other features of note:

·      *LK Grammars code for antecedence: Anaphoric dependency is grammatically specified. In other words, just as the antecedent of a reflexive is determined by (1), the antecedent of an anaphoric pronoun is determined by (2). If one understands “NP1 = NP2” to mean that the two nominals must (at least) have the same semantic value (i.e. that NP1 semantically binds NP2) then what the equality expresses is the idea that the grammar codes semantic binding and semantic antecedence.[1]  This has two consequences. First, that the grammar codes binding dependencies, not (co-)referential dependencies.[2] Second, there is no analogue of GB’s Condition B, which grammatically states an anti-binding restriction. (1) and (2) together determine the class of anaphoric dependencies. There is no specific coding for disjoint reference or anti-anaphora.[3]
·      *Some operations have priority over others. A key feature of the LK approach is that reflexivization obligatorily applies before pronominalization.  Were the operations either freely ordered or not obligatory then John hugged him would support the bound reading of the pronoun.  In effect, the LK account embodies an economy conception wherein reflexivization is preferred to (is obligatorily ordered before) pronominalization. Absent this preference, locally bound pronouns would be grammatically generated.  This point is made evident by considering a slight alternative version of the Pronominalization rule. Assume that we added the following rider to (2): NP1 and NP2 are not contained in the same simplex clause.  This codicil is analogous to the restriction in (1), where Reflexivization is limited to clause-mates.  Interestingly, this amendment allows (1) and (2) to be freely ordered. The clause-mate condition in (1) restricts application to clause-mated nominals and the one in (2) to non-clause-mated NPs. This suffices to prevent the illicit pronouns and reflexives in (3a)/(4a).[4]
·     * The LK approach is dependency centered not morpheme centered. (1) and (2) primarily code antecedence relations not morpheme distributions. A by-product of the dependency (in English) is the insertion of reflexive and pronominal morphemes. These are clearly surface morpho-phonological byproducts of the established dependency and can be expected to differ across languages.[5] Stated more baldly, one can have reflexive and bound pronoun constructions without reflexives or bound pronouns.  This gives the LK theory two distinctive characteristics when viewed with a modern eye. First, it distinguishes between morphemes that enter derivations from the lexicon and those that do not.  Second, it endows this distinction between morpheme types with semantic significance. In the context of the Standard Theory, the LK background theory, bound pronouns and reflexives are semantically inert. Here Deep Structure exclusively determines semantic interpretation. Consequently, as reflexive and bound pronoun morphemes are not in Deep Structure but are introduced in the course of the syntactic derivation they must be interpretively impotent.  There is one more interesting consequence, the LK conception rejects a central feature of later accounts: that morphological identity is a good guide to syntactic or semantic categorization.  In other words, for LK theorists, the mere fact that bound pronouns and deictic pronouns have the same morpho-phonological shape in many languages is at best a weak prima facie reason for treating them as a unified syntactic class.
·      *The binding rules in (1) and (2) also effectively derive a class of principle C effects given the background assumption that reflexives and pronouns morphologically obscure an underlying copy of the antecedent.[6] The derivation, however, is not particularly deep.  By stipulation, the rules retain the higher copies and morphologically reshape the lower ones into pronouns and reflexives. This has the effect of blocking the derivation of sentences like Himself kissed Bill, He thinks that John is tall, and (if the rules are ordered before WH-movement (aka Question formation) Who1 did he say t1 left. There are two noteworthy features of this account of principle C effects. First, as noted it is not deep for there is no reason for why the rules could not have been stated so that the higher copy (optionally) gets morphologically transmogrified. Were this possible all the indicated unacceptable sentences would be fully grammatically generated.  Second, this version of principle C effects only holds for bound anaphors. It does not extend to co-referential dependencies, which fall outside the purview of this version of the binding theory.  This is not, in itself a bad thing. As has been noted, there are well known “exceptions” to principle C where co-reference is tolerated. On the LK account, this is to be expected.[7]

In sum: for LK the syntax outputs antecedent-anaphor dependencies. This is explicitly and directly coded in the relevant binding rules.  The proposal has two central features: an economy condition in the guise of the preference for reflexivization over pronominalization and a distinction between formatives that enter derivations via rules like Lexical Insertion (e.g. words like cat, dog, the, this, deictic pronouns, etc.) and those that are the morphological by-products of rules of grammar (e.g. words like himself and certain bound hims that are morphological residues of established anaphoric dependencies).

[1] It must code more than this however for otherwise (2) could apply to the output of (1). It would suffice to block this to assume that some kind of syntactic identity is also required, e.g. that the two be tokens of the same type. For further discussion c.f. Hornstein 2001 and note 3.
[2] Figuring how to make this clear led to problems with the original LK account. For example, how exactly to code (i)?  It does not semantically express (ii).

(i)            Everyone hugged himself
(ii)          Everyone hugged everyone
Interestingly this problem for the LK theory has an answer in contemporary minimalist approaches if we take binding to be a chain relation.  In effect, the difference between the underlying form of (i) vs (ii) is that the latter has two selections of everyone in the numeration while the former has one. In other words, if we treat (1) and (2) as morphological spell out rules defined over chains, this problem disappears.  C.f. Drummond 2011, Idsardi and Lidz 1997 and Hornstein 2001 for discussion. We return to this point again later on.
[3] Lasnik’s 1976 proposal for an anti-co-reference rule is built around the problems regarding “accidental” co-reference that this fact entails.  Contemporary attempts to return to the LK vision have roughly followed Reinhart  in assuming that the possibility of grammatical binding restricts extra-grammatical co-reference options.
[4] We must still assume that they are obligatory, but this is to block principle C effects (e.g. John saw John, and John said that Mary like John) rather than assure the complementarity of reflexives and bound pronouns. 
[5] This is very much a Distributed Morphology conception, though in earlier theoretical guise.
[6] Recall that this assumption creates problems for quantified NP antecedents as remarked in footnote 3.
[7] C.f. Evans and Reinhart among others.  Note, in addition, that there are virtually no extant cases of inverse binding, i.e. where a pronoun is anaphorically dependent on an antecedent it c-commands.  Furthermore, even WCO configurations would seem to be underivable given the actual rules proposed.  Nice as this is, it is worth recalling that this empirical success arises from codifying the stipulation that it is the higher/leftmost copy that is retained and the lower rightmost copy that gets morphologically altered.


  1. I'm glad to see the distinction between lexical and function morphemes revisited in syntactic theory, because this is a very salient dimension in neuroscience. Studies of aphasia and neuroimaging studies find robust distinctions between these types of words, which would make it a happy convergence if syntactic theory also had splits along these lines.

    This video is an excellent anecdote underlying this point; it's really an amazing video of a presentation given by an actor with Broca's aphasia who explains this very distinction in his difficulty in producing and understanding language:

    1. The combinatorial operations of language seem more difficult (at least in this case) which also implies Merge is a more complex process.

  2. With regard to lexemes and syntax, some readers of FoL might be interested in Jan Koster's recent take (or retake) — 21. Koster.pdf. Likely rather too open-minded to get Norbert's stamp of approval, but some worthy points in there about language as a biological as well as cultural phenomenon, and an interesting take on the history of the generative enterprise.

    1. An interesting read, but nonetheless a very peculiar take on the history of the field.

      1) The whole question whether the Chomskyan revolution was a revolution strikes me as moot. It reminds of the sociology of science papers I read as an undergrad, where, if you look closely enough, no scientific revolution ever turns out to be a real revolution. Some things were already around earlier, and the novel ideas that create all the excitement end up looking a lot more like the previous model after 20 years of tweaking to accommodate the empirical facts. Heck, this is even true for Kuhn's idea of scientific revolution: Fleck already did it years earlier, it still incorporated aspects of many previous proposals, and every revised model of scientific revolutions ends up downplaying the role of the revolution part. But so what?

      The bottom line is that linguistics changed very rapidly starting in the late 50s, with a lot more intellectual mind share among other fields, a continuous growth in the number of departments, faculty, students, and publications. Eventually that might have also happened without the excitement surrounding early Transformational Grammar, but I wouldn't bet on it. So at an institutional level, there was a revolution. At best one can doubt, then, that there was an intellectual revolution, which takes me to point 2.

      2) Koster's main claim is that none of the central tenets of early transformational grammar have survived into newer iterations, but I don't see anything to back up this claim.

      - Transformations: Contrary to what Koster says, transformations are still around. They might not be called EquiNP and similarly cute names anymore, but how you call something really doesn't matter. Transformations are a mechanism for mapping trees to trees, and we still do that all the time: mapping syntactic trees to prosodic trees or LFs, linearizing trees, spelling out feature bundles as inflected lexical items, transderivational constraints, those are all tree transductions and hence transformations.

      - Levels: Levels are still around. D-structure corresponds to derivation trees, and the levels like SS, PF, and LF are the output of specific transformations that apply to these derivation trees. If we look at Minimalist grammars, even Aspects' PSG for the base component is still around because the derivation trees can be generated by a context-free grammar.

      - Two-step generation: see the previous two points.

      - Use of formal methods: I don't think I need to explain how I feel about that one, it comes up around here often enough.

      - Syntax instead of signs: On a formal level, that is not a meaningful distinction to begin with. You can always lexicalize and delexicalize your formalism as you see fit. From a practical perspective, I do not see anything that supports the claim that modern generative grammar is a theory of signs beyond the fact that lexical items are routinely viewed as a triple of syntactic, phonetic, and semantic components. And none of central points of interest --- locality restrictions on movement, binding, control, Agree --- use the latter two components a lot.

      3) The third section argues that language must be construed as a cultural phenomenon. I don't see why, just like I don't see why language must be construed as a biological enterprise in order for linguists to do good work. Yes, a conceptual frame of mind sharpens certain questions and heightens their importance, but it is not this mysterious thing that magically makes or breaks research. When you're trying to figure out scrambling, multiple wh, the Person Case Constraint, or the properties of Late Adjunction, you're not choosing your account based on whether it is cultural or biological. You pick whatever works.

    2. I just read the Koster piece and agree with Thomas overall. I would add two things. First that even if language is in some sense a cultural artifact is, as Koster notes "constrained by our individual biological properties." Actually K think that this need not even be said becuase it is so obvious. I agree. But if this is obvious, then a plausible question is which bill properties and to what degree are they linguistically specific. So far as I can tell, the history of GG can be seen as addressing just this question, the specificity of the capacity changing over time. In other words, the dichotomy that K pushes, culture or biology is a false dichotomy and was recognized as such from the start.

      Second, over 60 years GG has identified a class of dependencies that Gs allow (and don't allow). These characteristics seem unique to language (so far as we can tell). If we have identified them correctly (and I, maybe contra K, believe we have a pretty good handle on them) it is reasonable to ask why we have these class of dependencies and not others. Contra K I take this to be the minimalist question. It is not a theory aiming to "replace" GB so much as explain its properties. Confusing the minimalist question as K appears to do has really muddied up the intellectual waters.

      Last of all lexicalism. The issue of what constitutes the info that FoL embodies LIs with is still at the center of the discipline. I agree with K that we have made relatively little progress in understanding this beyond coding info into the items. But, the question is not (or not only) whether items are so coded, but also the etiology of the relevant features and the reach that features have. This is where locality conditions and the range of possible dependencies is critical. And this is where syntax has made progress. In other words, though we don't know much about the feature structures themselves, we have a pretty good idea of when features can interact. Heads can see heads (but not specifiers and complements), reflexives can see antecedents locally and if c-commanded, etc. Why these limitations. We have some ideas about this due to the work of the last 60 years. So while we don't really understand much (imo) about the substance of lexemes we do understand quite about about the dependencies they can enter into. In terms of Aspects vocab; we have decent theories of structural/formal universals but almost nothing about substantive universals. This is why, I suspect, that a history that focuses on these sees such little progress. There hasn't been much. But of course if you abstract from where the action has been the rest looks dull.

      So, I agree with Thomas take. K is a very smart person who has made significant contributions to linguistics over a long career. However, I think his Whig History is not particularly enlightening.

  3. If one understands “NP1 = NP2” to mean that the two nominals must (at least) have the same semantic value (i.e. that NP1 semantically binds NP2) then what the equality expresses is the idea that the grammar codes semantic binding and semantic antecedence.

    But it was even simpler than that in those days, wasn't it? Namely, the requirement "NP1 = NP2" just requires that they are syntactically identical, so the meaning of 'John washes himself' is computed from the deep structure that looks like 'John washes John'. Much as the meaning of 'John was arrested by the police', whatever on earth it might be, is computed from the deep structure that looks like 'The police arrested John'. End of story, right? No semantic values, no binding, no antecedence. (Hence problems with 'Everyone wants to win'.)

    1. I think you are right, though I never though of it this way before (thx). So, the idea was to reduce antecedence to syntactic identity. This works if syntactic identity is understood as co-reference, but necessarily fails for quantificational expressions (or for that matter for names if used "sloppily"). one way of thinking of a movement theory of anaphora is that it agrees that syntactic identity DOES underlie antecedence (i.e. that indexing does not imply co-reference) and that the right way to think of things is in terms of antecedence. The question then is whether this makes any sense in earlier theories with DS. I suspect not as we want a distinction between multiple occurrence of the same thing (multiple occurrences of the same copy) vs different occurrences of the same thing (different tokens of the same type). This can be done given current technology of one dumps DS and adopts a chain based account of anaphora. The interpretation is then antecedence based on syntactic identity (i.e. chains). But you are right, I think in the way you understand the history. Nice.