Wednesday, October 7, 2015

What's in UG (part 1)?

This is the first of three posts on a forthcoming Cognition paper arguing against UG. The specific argument is against the Binding Theory. But the form is intended to generalize. The paper is written by excellent linguists, which is precisely why I spend three posts exposing its weaknesses. The paper, because it will appear in Cognition, is likely to be influential. It shouldn’t be. Here’s the first of three posts explaining why.

Let’s start with some truisms: not every property of a language particular G is innate. Here’s another one: some features of G reflect innate properties of the language acquisition device (LAD). Let’s end with a truth (that should be a truism by now but is still contested by some for reasons that are barely comprehensible): some of the innate LAD structure key to acquiring a G is linguistically dedicated (i.e. not cognitively general (i.e. due to UG)). These three claims should be obvious. True truisms. Sadly, they are not everywhere and always recognized as such. Not even by extremely talented linguists. I don’t know why this is so (though I will speculate towards the end of this note), but it is. Recent evidence comes from a forthcoming paper in Cognition (here) by Cole, Hermon and Yanti (CHY) on the UG status of the Binding Theory (BT).[1] The CHY argument is that BT cannot explain certain facts in a certain set of Javanese and Malay dialects. It concludes that binding cannot be innate. The very strong implication is that UG contains nothing like BT, and that even if it did it would not help explain how languages differ and how kids acquire their Gs. IMO, this implication is what got the paper into Cognition (anything that ends with the statement or implication that there is nothing special about language (i.e. Chomsky is wrong!!!) has a special preferential HOV lane in the new Cognition’s review process). Boy do I miss Jacques Mehler. Come back Jacques. Please.

Before getting into the details of CHY, let’s consider what the classical BT says.[2] It is divided into three principles and a definition of binding:

A.   An anaphor must be bound in its domain
B.    A pronominal cannot be bound in its domain
C.    An R-expression cannot be bound

(1)  An expression E binds an expression E’ iff E c-commands E’ and E is co-indexed with E’.

We also need a definition of ‘domain’ but I leave it to the reader to pick her/his favorite one. That’s the classical BT.

What does it say? It outlines a set of relations that must hold between classes of grammatical expressions. BT-A states that if some expression is in the grammatical category ‘anaphor’ then it must have a local c-commanding binder. BT-B states that if some expression is in the category ‘pronominal’ then it cannot have a local c-commanding binder. And BT-C states, well you know what it states, if

Now what does BT not say? It says nothing about which phonetically visible expressions fall into which class. It does not say that every overt expression must fall into at least one of these classes. It does not say that every G must contain expressions that fall into these classes. In fact, BT by itself says nothing at all about how a given “visible” morphologically/phonetically visible expression distributes or what licensing conditions it must enter into. In other words, by itself BT does not tell us, for example, that (2) is ungrammatical. All it says is that if ‘herself’ is an anaphor then it needs a binder. That’s it.

            (2) John likes herself

How then does BT gain empirical traction? It does so via the further assumption that reflexives in English are BT anaphors (and, additionally, that binding triggers morphologically overt agreement in English reflexives). Assuming this, ‘herself’ is subject to principle BT-A and assuming that John is masculine, herself has no binder in its domain, and so violates BT-A above. This means that the structure underlying (2) is ungrammatical and this is signaled by (2)’s unacceptability.

As stated, there is a considerable distance between a linguistic object’s surface form and its underlying grammatical one. So what’s the empirical advantage of assuming something as abstract as the classical BT? The most important reason, IMO, is that it helps resolve a critical Poverty of Stimulus (PoS) problem. Let me explain (and I will do this slowly for CHY never actually explains what the specific PoS problem in the domain of binding is (though they allude to the problem as an important feature of their investigation), and this, IMO, allows the paper to end in intellectually unfortunate places).

As BT connoisseurs know, the distribution of overt reflexives and pronouns is quite restricted. Here is the standard data:[3]

(3) a. John1 likes herself1/*2
b. John1 likes himself1/*2
c. John1 talked to Bill2 about himself1/2/*3
d. John1 expects Mary2 to like himself*1/*2/*3
e. John1 expects Mary2 to like herself*1/2/*3
f. John1 expects himself1/*2/*3 to like Mary2
g. John1 expects (that) he/himself*1/*2/*3 will like Mary2

If we assume that reflexives are BT-A-anaphors then we can explain all of this data. Where’s the PoS problem? Well, lots of these data concern what cannot happen. On the assumption that the ungrammatical cases in (3) are not attested in the PLD, then the fact that a typical English Language Acquisition Device (LAD, aka, kid) converges on the grammatical profile outlined in (3) must mean that this profile in part reflects intrinsic features of the LAD. For example, the fact that kids do not generalize from the acceptability of (3f) to conclude that (3g) should also be acceptable needs to be explained and it is implausible that the LAD infers that that this is an incorrect inference by inspecting unacceptable sentences like (3g), for being unacceptable they will not appear in the PLD.[4] Thus, how LADs come to converge to Gs that allow the good sentences and prevent the bad ones looks like (because it is) a standard PoS puzzle.

How does assuming that BT is part of UG solve the problem? Well, it doesn’t, not all by itself (and nobody ever thought that it could all by itself). But it radically changes it. Here’s what I mean.

If BT is part of UG then the acquisition problem facing the LAD boils down to identifying those expressions in your language that are anaphors, pronominals and R-expressions. This is not an easy task, but it is easier than figuring this out plus figuring out the data distribution in (3). In fact, as I doubt that there is any PLD able to fix the data in (3) (this is after all what the PoS problem in the binding domain consists in) and as it is obvious that any theory of binding will need to have the LAD figure out (i.e. learn) using the PLD which overt morphemes (if any) are BT anaphors/pronominals (after all, ‘himself’ is a reflexive in English but not in French and I assume that this fact must be acquired on the basis of PLD) then the best story wrt Plato’s Problem in the domain of binding is where what must obviously be learned is all that must be learned. Why? Because once I know that reflexives in English are BT anaphors subject to BT-A then I get the knowledge illustrated by the data in (3) as a UG bonus.  That’s how PoS problems are solved.[5] So, to repeat: all the LAD needs do to become binding competent is figure out which overt expressions fall into which binding categories. Do this and the rest is an epistemic freebie.

Furthermore, it’s virtually certain that the UG BT principles act as useful guides for the categorization of morphemes into the abstract categories BT trucks in (i.e. anaphor, pronominal, and R-expression).  Take anaphors. If BT is part of UG it provides the LAD with some diagnostics for anaphoricity. Anaphors must have antecedents. They must be local and high enough. This means that if the LAD hears a sentence like John scratched himself in a situation where John is indeed scratching himself then he has prima facie evidence that ‘himself’ is a reflexive (as it fits A constraints). Of course, the LAD may be wrong (hence the ‘prima facie’ above). For example, say that the LAD also hears pairs of sentences like John loves Mary. She loves himself too and ‘himself’ here is anaphoric to John, then the LAD has evidence that reflexives are not just subject to BT-A (i.e. they are at best ambiguous morphemes and at worst not subject to BT-A at all). So, I can see how PLD of the right sort in conjunction with an innate UG provided BT-A would help with the classification of morphemes to the more abstract categories using simple PLD in the.[6]  That’s another nice feature of an articulate UG.

Please observe: on this view of things UG is an important part of a theory of language learning. It is not itself a theory of learning. This point was made in Aspects, and is as true today as it was then. In fact, you might say that in the current climate of Bayesian excess that it is the obvious conclusion to draw: UG limns the hyporthesis space that the learning procedure explores. There are many current models of how UG knowledge might be incorporated in more explicit learning accounts of various flavors (see Charles Yang’s work or Jeff Lidz’s stuff for some recent general proposals and worked out examples).

Does any of this suppose that the LAD uses only attested BT patterns in learning to classify expressions? Of course not. For example, the LAD might conclude that ‘itself’ is a BT-A anaphor in English on first encountering it. Why? By generalizing from forms it has encountered before (e.g. ‘herself’, ‘themselves’). Here the generalization is guided not by UG binding properties but by the details of English morphology.  It is easy to imagine other useful learning strategies (see note 6). However, it seems likely that one way the LAD will distinguish BT-A from BT-B morphemes will be in terms of their cataphoric possibilities positively evidenced in the PLD.

So, BT as part of UG can indeed help solve a PoS problem (by simplifying what needs to be acquired) and plausibly provides guide-posts towards that classification. However, BT does not suffice to fix knowledge of binding all by itself nor did anyone ever think that it would.  Moreover, even the most rabid linguistic nativist (I know because I am one of these) is not committed to any particular pattern of surface data. To repeat, BT does not imply anything about how morphemes fall into any of the relevant categories or even if any of them do or even if there are any relevant surface categories to fall into.

With this as background, we are now ready to discuss CHY. I will do this in the next post.

[1] I have been a great admirer of both Cole and Hermon’s work for a long time. They are extremely good linguists, much better than I could ever hope to be. This paper, however, is not good at all. It’s the paper, not the people, that this post discusses.
[2] I will discuss the GB version for this is what CHY discusses. I personally believe that this version of BT is reducible to the theory of movement (A-chain dependencies actually). The story I favor looks more like the old Lees & Klima account. I hope to blog about the differences in the very near future.
[3] As GGers also know, the judgments effectively reverse if we replace the reflexive with a bound pronoun. This reflects the fact that in languages like English, reflexives and bound pronouns are (roughly) in complementary distribution. This fact results from the opposite requirements stated in BT-A and BT-B. The same effect was achieved in earlier theories of binding (e.g. Lees and Klima) by other means.
[4] From what I know, sentences like (3g) are unattested in CHILDES. Indeed, though I don’t know this, I suspect that sentences with reflexives in ECM subject position are not a dime a dozen either.
[5] I assume that I need not say that once one figures out which (if any) of the morphemes are pronominals then BT-B effects (the opposite of those in (3) with pronouns replacing reflexives) follow apace. As I need not say this, I won’t.
[6] Please note that this is simply an illustration, not a full proposal. There are many wrinkles one could add. Here’s another potential learning principle: LADs are predisposed to analyze dependencies in BT terms if this is possible. Thus the default analysis is to treat a dependency as a BT dependency. But this principle, again, is not an assumption properly part of BT. It is part of the learning theory that incorporates a UG BT.


  1. The money quote appears to be:

    “If this analysis of awake dheen is correct, it constitutes a serious challenge for UG-based approaches to Binding. The presence in a language of a form that is used anaphorically but which is exempt from the Binding requirements of UG would impose a considerable burden on the child acquiring the language. The problem is that a child learning to speak a language would need to learn which forms that are functionally anaphoric in their use are subject to UG principles of Binding and which are not. The existence of UG sanctioned categories for anaphora simplifies learning only if all anaphoric elements (in the nontechnical sense of ‘‘anaphoric” that includes both pronouns and reflexives) are subject to UG principles.”

    I can’t really make much sense of this, especially the last sentence. I’m surprised that CH&Y didn’t mention Turkish complex reflexives, which have been known at least since the late 80s to apparently not obey any syntactic binding requirements. A pretty reasonable proposal about how reflexives of this sort work is that they have a complex syntactic structure containing a pronominal, so that the pronominal is shielded from local binding (Kornfilt 2001). This analysis doesn’t require modifying UG to permit the existence of pronouns that are subject neither to Condition A nor Condition B. Some other instances where “shielding” has a more transparent morphological realization are mentioned in Reuland (2001:482). [I’m sure there are much earlier references for some of this stuff; I’m just citing what I know.] Anyway, I don’t see why it would be a big deal if UG were to sanction pronouns that are subject to neither Condition A nor Condition B. It’s not as if CH&Y claim to have found pronouns which obey syntactic constraints on their distribution-under-a-given-interpretation other than those conditions. So the kid still knows that if there is a syntactic constraint at work, it’s Condition A or Condition B.

    1. Could not agree more. I say this at lenght (and more and more) in the next two installemnts.

    2. One could also mention logophoric anaphoric use, which are quite pervasive in some languages. So indeed, that "only if all anaphoric elements" seems to be way too strong.

  2. I'm pessimistic about their result too. It would be one thing if they showed that every logically possible kind of pronominal was possible. They don't. They just show that there are things that refer back that aren't Principle A type or Principle B type. (I thought this was what ziji was? So didn't we already know this?) It would be one thing to show that there aren't any patterns _at all_ in the kinds of pronominals we see, but all this shows is that there's an item somewhere that doesn't fit the pattern. So I preface by saying I think we're on the same page in terms of the conclusion not matching the premises.


    I also think that your statement that "it’s virtually certain that the UG BT principles act as useful guides for the categorization of morphemes into the abstract categories BT trucks in" is wrong. It seems to me that Naho Orita's 2013 Cog Sci casts that in serious doubt.

    The paper tried to build a model that learned categories of pronominal morphemes in terms of what syntactic positions their antecedents could be in (syntactic position coded as to whether it was local and/or c-commanding or not). This turned out to be hard. But, contrary to our intuitions the strategy that worked as a useful guide was _not_ BT. BT - forget just the categories, the full monty - didn't improve anything. Saying "there are two kinds of pronouns, one with local-c-commanding antecedents **and these must be reflexive** and another with non-local/c-commanding antecedents **and these must not be reflexive** did not work.

    Now we know that it's possible to succeed with this model. If you take the items, give some context in a dialogue, and remove the pronouns, and then get people to tell you, based on the discourse context, what kind of pronoun they expect, then you have, on an item by item basis, some sense of what people's expectations for the reflexive/non-reflexive meaning of the sentence was.

    So, briefly then - the result was that having a guess at the meaning, broad strokes, of a sentence was _sufficient_ to learn that, syntactically, there were two categories of pronouns. Binding principles were not.

    Is having a good guess at the referent a realistic assumption for kids? I'm happy to give that a categorical no. But is it "virtually certain" that BT would help when they don't? By no means. This model says no.

    1. I'd add - the model that had the discourse information didn't have BT coded in it.

    2. I've only skimmed the paper, but I thought the point was that you needed discourse information in addition to syntactic knowledge. The result was that the full monty model didn't work well in the absence of discourse information regarding probable antecedents. As far as I can tell, they didn't test whether the fully monty model worked better than the bare bones model when both had access to discourse information.

    3. Let me add: say your aim is to classify some morpheme as BT-A exempt or not. Say it can appear in a clause with a local antecedent. What leads you to think that it is NOT BT-A compatible? Here's some evidence: it can have a non-sentential antecedent (or a non-commanding antecedent). I am not sure what else could tell you that it is BT-A exempt. So, as Alex noted, thought the BT is not sufficient to categorize expressions, it is IMO very likely useful.

  3. At the risk of asking a question that has a very obvious answer, what is the evidence that the stimulus is poor to begin with? Can't you learn which anaphora must be c-commanded by a co-indexed noun phrase and which ones can't just by looking at a parsed corpus? E.g., do we know that there aren't enough examples to notice that "himself" is always in the former category and "he" is always in the latter?

    1. Yup, you can figure out given a parsed corpus how to bin overt expressions as anaphors or pro nominals. The question is how you generalize to the bad cases and the more complex good ones. Thus, how do you know that you cannot get nominative anaphors (e.g. *John believes that him/he self is intelligent) but it is ok to get ECM anaphors (e.g. John believes himself to be intelligent). Also that non-c-commanding antecedents lead to * (e.g.*John's mom loves himself). Or why given that an anaphor can be in another clause and be ok (see above) why can't it be in the object of the other clause and be ok (e.g. *John believes that Mary loves himself). Note that many of those involve negative data and (Jeff Lidz tells me) that many involve structures that are only anemically represented in the PLD. So how does the LAD generalize from the simple cases to the non-simple ones correctly? Here's the hypothesis: the LAD has sufficient evidence in the PLD to solve the binning problem (categorizing some expression as anaphoric or pronominal and once it knows this, it knows the rest. It need not learn the rest if it "knows" BT. All it needs to know is which overt expression falls into which BT category.And for this, there appears to be enough info in the PLD.

      BTW, Jeff Lidz tells me that the Orita et al paper (of which one of the et als is Jeff) assumes that you can track clausematedness and c-command in the PLD. Given this, you can bind expressions as anaphoric or pronominal. So, the descriptive vocabulary of BT is presupposed in this work. The work does not show (indeed it cannot Jeff tells me) how to generalize from the simple to the complex cases. FOr this one needs BT (or some analogue). However, Ewan is right, the BT itself is not necessary for the results, but the theoretical ingredients of the BT are (locality and c-command).

      Hope this helps.

    2. I understand that the learner needs to have the (innate) capacity to use constructs like c-command and locality to characterize the distribution of words, but I'm not sure what binding theory buys us on top of that.

  4. Hi Norbert -- thanks for relatively the clear post!

    However, I don't understand this bit "On the assumption that the ungrammatical cases in (3) are not attested in the PLD, then the fact that a typical English Language Acquisition Device (LAD, aka, kid) converges on the grammatical profile outlined in (3) must mean that this profile in part reflects intrinsic features of the LAD."

    In particular, what's the logic that's being employed in "must mean"? It seems like there are some pretty strong assumption's about how the LAD is working in order for this to be a valid conclusion.

    Here's the part I get. We assume (A) kids don't hear the ungrammatical versions in (3) (or at least hear them rarely as noise). We assume (B) kids do figure out which are grammatical and which are ungrammatical (seems right). But then we want to conclude that the kid has some built in inductive bias that helps them with (B). That's the part I don't follow.

    (Sorry for being dense -- I'm really trying to understand.)

    1. I wonder if the following interpretation is consistent with Norbert's post: There are some sentence types that people have clear acceptability judgments about despite not having encountered them before (and by "them" I think Norbert literally meant those particular sentence types). So that means people generalize from the primary linguistic input to those novel sentence types. For any generalization to be possible, the hypothesis space needs to be structured in a way that enables that generalization. In this particular case, the claim is that people generalize based on c-command relations, which means that they're capable of representing them. Logically this means that c-command (or some other primitives that c-command can be constructed from) must be "in the LAD", which I take to mean "within human representational capacities". This is admittedly a pretty weak sense of bias, but this might be what Norbert meant.

    2. @hal: I am assuming that there are two options: either the data drives the LAD to the attained knowledge or some intrinsic feature of the LAD drives it to the attained knowledge. If we can show that the first does not obtain then the second must. The modal is driven by the disjunctive options.

      Now, note, I have not said WHAT structural properties of the LAD are doing the driving. BT suffices. Other biases might also serve. However, if the data cannot do it, then something intrinsic to the LAD must.

      Now there are two problems; call them (1) the categorization problem and (2) the inference problem (I owe these terms to Jeff Lidz from whom I eagerly stole them). IN the text, BT clearly helped with (2). How, because it specifies how GIVEN that we know that something is an anaphor, we also know how it will behave wrt data that the LAD never encounters (the data in 3). How about (1)? Does BT help here? I said it likely did, or could. Orita et al argue that one does not need the full BT to help solve the categorization problem (which overt morphemes are anaphors/ pro nominals etc). It seems that simply seeding the LAD with the capacity to track c-command and semantic dependency and clausemateness can serve to bin overt morphemes correctly. This does not mean that fuller knowledge of BT might not also serve, just that it is not necessary. I personally find this difference of marginal interest. If we ask the LAD to track BT relevant properties, that is enough for me (Tal: I don't think that this is negligible. CC is hardly an obvious property nor is clause mate). Still, as Jeff pointed out to me even if we solve (1) we still need something like BT to solve (2). (2) is where the PoS problem lies.

      Hope this helps.

    3. That's helpful, thanks.

      The disjunction in your first paragraph makes sense to me, I guess the question is what evidence is there that the data doesn't drive?

      It seems like the implicit argument is that "because LAD never sees the negative evidence, it cannot learn that these are disallowed." Is that basically the argument?

    4. As Tal said, it's easy enough to construct sentences involving anaphoric binding where the judgments are clear but which are going to be extremely rate in the input. To pick an example at random, sentences where a PP containing an anaphor is topicalized (possibly across a clause boundary) and the anaphor is bound under reconstruction:

      (1) To each other, the boys talk frequently.
      (2) To each other, Mary said that the boys talk frequently.

      If you haven't heard either (1) or (2), what should your guess be as to their status? Or if you've heard (1) but not (2), should you guess that the clause boundary makes a difference or that it doesn't? Clearly, a LAD could be built in such a way that it did or didn't generalize in the right way in each of these scenarios. So it’s the structure of the LAD, not the data, that led us all to acquire grammars where both (1) and (2) are ok. That doesn’t necessarily mean that there’s no “learning” involved.

    5. I guess I don't understand well enough how we can tell the difference between something that's part of inductive bias and something that's acquired from data. (As long of course, as Tal says, your hypothesis space can represent it.) Just because I've never seen a particular construction used in a particular way and I can make judgments about it, _doesn't_ mean that I must have some inductive bias about that construction. For instance, as long as something like binding is in my hypothesis space, and something like topicalization is in my hypothesis space, I don't necessarily need to see all possible combinations of these things in order to generalize to others.

      If the claim is just that the hypothesis class includes this stuff, then I'm fine to (tentatively) agree there. I think going beyond and saying it's part of the IB is saying a lot more and I don't think we have many tools for teasing these apart.

    6. Just because I've never seen a particular construction used in a particular way and I can make judgments about it, _doesn't_ mean that I must have some inductive bias about that construction

      I don't understand this. Say I decide that (1) and (2) are both good (having heard examples of neither construction). If it wasn't an inductive bias that led me to do that, then what was it? Merely having seen binding and topicalization independently doesn't tell me anything about how they'll interact, so I don't see how I could 'generalize' from examples like "The boys talked to each other" and "To John, Mary said that Bill talks frequently". Maybe I'm not allowed to reconstruct at all; maybe I'm allowed to reconstruct only within a minimal clause; maybe I'm allowed to reconstruct over any distance; or ... I don't see how I can decide between all the logically possible options unless I'm biased in favor of one over the others, either because one of the options is the only hypothesis in my hypothesis space, OR because I assign it a higher probability in the absence of any relevant evidence. This might be a broader use of the term "bias" than your use, but I feel that there's more than just a terminological issue here.

    7. I think we (basically) agree on terminology.

      The narrow question to be answered here seems to be something like "assume I've never seen A and B interplay -- of the logically possible truth tables, how can I have gotten the right one?"

      The claim seems to be: it must be either (a) the right one is the only one in my hypothesis space; or (b) the right is prefered by my inductive bias. (a) is subsumed by (b) for very strong bias.

      But this seems to assume that A and B are sort of atomic elements. Suppose that A decomposes into X,Y,Z and B decomposes into P,Q,R,S? Maybe from other constructions I can learn how (some subset of) XYZ plays with (some subset of) PQRS. This would exhibit itself as "learning from nothing" how A and B interact, but only because my unit of analysis is too broad.

      It seems analogous to saying "if I've never seen the bigram 'pink elephant' before I cannot possibly decide whether this is or is not grammatical." But obviously can tell because there's a finer grained structure hiding behind that I learned (namely adjectives and nouns).

    8. I think then we agree in the general case, since I agree with what you say when everything is ABC and XYZ. However, the specifics are important in POS arguments. I think in the case of the examples I gave, there is little hope of decomposing things in the way you suggest. I don’t know any syntactic framework where reconstruction for Condition A follows from any very deep principles. I.e., you could easily redefine things such that it didn’t occur with few if any knock-on effects. So you really do need to see some cases where a phrase containing an anaphor moves and the anaphor is bound under reconstruction. You can potentially generalize from other movements to topicalization, but that’s risky. E.g, with respect to Condition C, not all kinds of movement show reconstruction effects, so an eagerness to generalize across movement types would get you into trouble there.

    9. GOT IT!

      So basically the argument is that we haven't found / there might not be smaller pieces into which to divide these things. And it sounds like this is not for lack of trying :)

      Thanks for walking me through!

    10. I don't think locality and c-command are negligible (especially locality, which can be defined in so many ways), I just didn't understand what an innate binding theory buys us on top of those primitives. Alex's reconstruction in topicalization might be relevant to this, but I'm not sure I understand how yet.

    11. @Tal: We seem to be on the same page then. BT has four parts: (i) a licensing condition which has 2 parts (a) c-command and (b) locality, (ii) an inventory of categories (anaphors, pronominals, others) and (iii) a modal requirement (Anaphors MUST…, pronominal CANNOT…R-expressions CANNOT…). If we build (i) into the "learner" (as be both agree is required, then we still need the other two, though not perhaps for the purpose of categorizing surface morphemes into abstract BT classes. We need them for inferential reasons. Thus, when you know that 'John likes himself' is good how does that license you knowing that 'It seems to himself that Mary left' is bad. You could say that this is just part of the learner (assume something is obligatory unless you've seen ok cases which violate it). But this is still built into the learner and it is not clear that this principle is a general truth of learning rather than language learning. Same with BT-B. The more challenging cases for which the modality seems useful are the ones in (3) that involve generalization from simple cases. How to understand the acceptability of 'John wants very much for himself to win' as ok but 'John hopes that him/he-self will win' is not. We seem to generalize in some ways but not others. WHat are the licit ways. BT aims to tell us.

      So, yes, you get a lot just with locality and cc. But I don't think you get it all. And for that you need BT. Do you need it for categorization? I suspect not. COuld you use it if you have it? Why not? So, that's the value added of BT.

  5. This comment has been removed by the author.

  6. I have a question about the POS argument related to (3g), here split up in two examples:

    (3g-i) *John expects that himself will like Mary
    (3g-ii) *John expects that heself will like Mary

    The idea here is that it is impossible to figure out the ungrammaticality of these examples without some prior BT knowledge. However, for (3g-i) one could say that you have a non-nominative subject in a finite subject position, the impossibility of which has to be acquired independent of BT data. So it’s at least not so clear this is a case of binding POS. For (3g-ii), the question arises why “heself” is an impossible word. One answer could be BT (perhaps with the locality parameter switched to “finite clause”), which forbids nominative anaphors. But is this the only conceivable explanation? Is it conceivable that the impossibility of “heself” reflects a conservative learning strategy pertaining to the acquisition of morphology more generally? “Self” attaches to accusative pronouns only, much like –ize attaches to adjectives only. Or do we want to see a POS problem in the acquisition of –ize too? Just want to be sure that we have tried everything else :)

    1. @Olaf: Some of this is an artifact of using English as the language of demonstration. In ergative languages, there are certainly absolutive anaphors, and – if the embedded finite clause in (3g) is made to be unaccusative, for example – there would be a morphological form available for the counterpart of (3g).

      So one interesting question is this: is there an ergative language with reflexive anaphors that are Condition A obeyers, and has anything like the ECM construction in (3f). If the answer is "no" that might be interesting (depending on whether ECM is statistically common enough to make that gap meaningful).

    2. Yes, I thought this could be the case. It is important though to establish which POS arguments can be made on the basis of English data alone and which require a broader typological look before we use them in debates with non-UG adepts.

    3. I think that there is another fact worth thinking about: why is there such a paucity of nominative marked reflexives. My recollection is that there are virtually none out there. This cannot be a mere morphological gap for there is nothing apparently wrong morphologically with he-self/she-self/they-selves, we-selves as morpho forms. In fact, given the existence of him/her self why wouldn't the child generalize to the full paradigm? But this doesn't happen. In fact, things are more interesting still. Again, as I recall, Rizzi and Woolford noted that anaphora is incompatible with agreement (anaphors cannot occur in agreement positions). This suggests that the problem with nominative is that they induce agreement. This gets some further support from Icelandic where quirky anaphoric subjects are ok, but nominative ones are not. At any rate, I don't think that the problem in English is morphological but more general: why no nominative reflexives? Indeed why no reflexives in agreement positions? There are several stories out there that try to answer this question in a semi-principled manner. Hope this helps.

    4. I have little doubt that Olaf knows about the anaphor agreement effect, Norbert :-) I think what he's pointing out is that if that is independently a principle of grammar, then one could offer the following explanation for why (3f) is not extended, by analogy, to (3g):

      i. using "himself" is out for case reasons
      ii. "heself" is ruled out by the anaphor agreement effect

      Now, it could very well be that one or both of (i-ii) adhere, themselves, to the logic of PoS (I think you, Norbert, are implying that (ii), at least, does). And so, when taken together, there is still a PoS argument to be had here, but it does not fit tightly within the compounds of "is BT innate." (Though, of course, the latter doesn't really matter unto itself; the real question, I think we all agree, is whether facts like (3f-3g) in a language like English form a PoS puzzle or not.)

    5. Sorry, not sure I follow. There seems to be a PoSish question: why don't we find nominative anaphors? Answer, it's out by the binding theory. This is a good answer which, from what I can tell, we have no other answer for. It seems that this is actually part of a larger fact, one that is not visible in English (the anaphor agreement effect). The question for PoS is why LADs don't extend the logic of 3f to 3g allowing it to be ok. Indeed, why don't they extend the morphological logic from himself to heself etc? Maybe I am missing something.

    6. @Norbert: The inexistence of nominative anaphors is not reducible to binding theory. To wit, the contrast between dative subject of embedded finite clauses in Icelandic (which can be anaphors) and nominative ones (which cannot). [see Rizzi, Woolford]