Monday, November 3, 2014

More on PoS2

In a previous post (here), I confessed to hubris: I thought that solving PoS1 problems sufficed to finesse PoS2 difficulties. I now think that I was wrong to think this. But hey, I was young and brash and now I am mellow and judicious. Just the exuberance of youth!  In this post, I’d like to consider two issues: (i) what models we have for investigating how LADs acquire their particular Gs given a specification of the possible human Gs and (ii) how PoS1 conclusions might be leveraged to address PoS2 concerns.

IMO, GG has gotten further in limning the limits of G-hood than we have gotten in explaining how LADs move through the space of possible Gs to the actual Gs acquired. GG has made serious progress in addressing PoS1 issues. We can tell pretty good stories about G-invariant properties (i.e. why certain kinds of dependencies are unattested in human Gs (e.g. Islands, ECP, CED etc.) and can also sketch out accounts regarding those things that Gs must contain (e.g. anaphors must be “close” to their antecedents in a way we can specify pretty well)). This provides us with pretty good accounts of what sorts of Gs are impossible (i.e. what kinds of dependencies Gs will never contain).

In contrast, we do not have particularly good accounts for why speakers acquire the particular Gs that they do (e.g. why does English obey the Fixed Subject Constraint (FSC) but Italian doesn’t? Why isn’t English pro-drop? Why doesn’t English have resumptive pronouns?).[1] Though we do have accounts aiming in this direction from the acquisition, diachronic and typological literature.[2] The accounts fall into two basic types.

The first kind of account parameterizes a principle, the classic case being the CP/TP parameter for bounding nodes. You remember the story based on work by Rizzi.[3] The Subjacency Principle is an invariant UG principle. However, the principle is defined over bounding nodes, and these can vary across Gs. The differences between English and French with regard to extraction from embedded questions reduces to the fact that in English TP is a bounding node while in Italian CP is. This difference suffices to explain both the similarities and differences with regard to island extraction in the two languages.[4] On this story, acquisition amounts to fixing the value of the bounding node parameter in your language.

In a very interesting paper in progress (hence I cannot link to it, sorry, but be patient), Dustin Chacon, Mike Fetters, Margaret Kandel, Eric Pelzl and Colin Phillips (CFKPP) call this “direct learning.” How is the value fixed? By induction from the PLD (choose your favorite inductive theory), which, it is hoped, provides sufficient amounts of robust data to allow the LAD to directly fix the value of the parameter (see note 5). To my knowledge (and please correct me if there is stuff out there that contradicts what I am about to say), we are still not sure if the actual PLD available in Italian and English suffices to fix the two possible values.[5]

The second kind of account keeps principles fixed (no parametric variation of the principles) but allows for derivations that circumvent the relevant universal condition. This is similar to CFKPP’s conception of “indirect learning.” There are several examples of this. For example, Reinhart’s proposal that CP can have more than one “escape hatch” and thereby allow two WHs to move to an embedded Spec CP position thereby allowing one of them to exit while still adhering to the Subjacency Condition. On this view the different data are not traced to a parameter within bounding theory, but to another kind of fact, namely that different Gs allow for different kinds of rules (viz. English Gs only allow CP expansion rules with a single CP specifier, while other Gs (e.g. Romanian/Bulgarian) might allow more than one, thereby leaving an Spec C exit for a second A’-mover).  There is potential degree 0 data that could fix this (e.g. sentences like “Who what bought” would support the conclusion that CP can house multiple WHs). However, the only investigations of the PLD that I know of (by Lydia Grebenyova for Russian) suggest that multiple interrogatives are very far from ubiquitous in the PLD (actually there are none). If so, how the rule allowing multiple Spec Cs would be acquired remains a mystery.

Let’s consider another example where a much more satisfying story exists (CFKPP discuss this case at length and explore its subtleties). Take Rizzi’s explanation for why English but not Italian[6] is subject to the Fixed Subject Condition (FSC): (1a/b):[7]

(1)  a. *Who1 do you know that t1 ate a large supper (English)
b.  Who1 do you know that t1 ate a large supper  (Italian)

The account has the following components:
(2)  a. Something like the FSC (non-parameterized) is part of UG
b. Italian has a way of evading the requirements of the FSC, but English doesn’t
c. That Italian can generate structures that evade the FSC is manifest in simple Italian clauses

More concretely, FL/UG contains something like the that-t filter. It stars structures in which C0 governs a trace (e.g. *[CP … C [ t1…]]). Italian (but not English) allows for post verbal subject constructions, in which the subject DP is not in the government purview of C:[8]

(3)  a. Had telephoned John (ok-Italian/*-English)
b. [ C [   [had [VP telephoned John]]]]

As a WH moving from the position of John in (3b) will not generate a structure subject to the FSC, sentences like Who do you think that phoned will be fully acceptable in Italian. In other words, Italian does respect the FSC, and the FSC is exactly the same in English and Italian. The difference between them is that the Italian allows the effects of the FSC to be evaded by allowing for movement from post-verbal subject position.

Two things to note: first, post-verbal subjects are not rarities in Italian (or Spanish which is similar) so we expect them to arise frequently and robustly in the PLD. This should provide plenty of PLD fodder for whatever rules generate post verbal subject constructions in Italian and Spanish.

Second, having post-verbal subjects suffices to evade the FSC, but it is possible that there exist other ways of doing so. Nonetheless, it appears that this is a very common way of evading the FSC. CFKPP reviews the FSC variation literature, and suggests that there are not all that many ways to skirt the FSC. Before reading CFKPP I was under the impression (based on widely cited work by Sobin) that certain dialects of English provided evidence that one could evade the FSC in other ways (English does not have post verbal subjects). However, the CFKPP provides excellent reasons (based in part on work by Cowart) that Sobin’s findings are at best inconclusive and most likely incorrect.

CFKPP does something else that is very important: it actually tries to estimate how much data there is in actual PLD bearing on the FSC in both English and Spanish/Italian (effectively the same language for FSC purposes). Bottom line: not very much at all, so were the LAD required to “directly learn” whether the FSC held, it would have a very difficult time doing so. There is just not that much direct data bearing on it. Instead, the child seems to assume that it holds universally. However, this does not imply that every language will appear to respect the FSC for there may be indirect ways of meeting its requirements while still deriving sentences that allow traces abutting Cs. As post verbal subject constructions provide such an out, the differences between English and Italian follow even if we the FSC is left unparameterized.[9]

Note, btw, that this kind of analysis highlights the difference between a Chomsky vs a Greenberg Universal. On this story the FSC regulates Italian Gs just as much as English ones despite its effects being invisible in Italian. In other words, the FSC holds in Italian despite never appearing to hold there. This makes sense on Chomsky’s conception of universals but not Greenberg’s. Chomsky universals are generalizations about structures Greenberg universals about surface forms. They are very different, though far too often confused (as I rail about again in previous posts (see here and here for a reprise).

Ok, back to the main point and I end. There is lots of G variation, and this means that some properties of Gs are acquired on the basis of actual PLD. When one looks carefully, it appears that for many kinds of variation, there is really not that much PLD to go on, and this raises a PoS2 problem. We have a couple of examples of how to solve such PoS2 problems. However, there has been relatively little attention paid to the specific problems it raises (I also plead guilty here). Regarding these, CFKPP presents a useful classical PoS challenge to people of my ilk:

We challenge theoretical syntacticians working on any phenomenon that varies between languages to consider whether the phenomenon in question lends itself to direct observation or not. If not, it must be conditioned on other observable phenomena. This can serve as a useful heuristic for constructing accounts of phenomena in comparative syntax. (20)

Yes, yes and yes again. Note in cases where indirect stories are required, looking for them can generate interesting research into the possible variation among Gs. The Rizzi account of FSC above begins by assuming that the FSC is universal and then looks for ways that particular Gs might circumvent it. Such cases of indirect acquisition leverage what we believe to hold given standard PoS1 considerations. So why does Italian appear not to obey the FSC? Not because the that-t filter doesn’t hold in Italian, but because Italian G allows for derivations that circumvent its strictures. How do Italian Gs do this? By allowing for post-verbal subjects which allow licit “subject” A’-movement derivations. Is this fact about Italian Gs learnable? Yes. Post verbal subjects are not rare, and so the LAD has evidence for postulating rules to generate these structures, while the English kid does not. So, PLD driven acquisition plus UG fixed principles can lead to plausible accounts of G variation (i.e. to stories addressing the question how John/Gianni acquired the particular Gs they did). What’s the moral: don’t parameterize your principles but look for G rules/structures that would allow them to be empirically mute. This sort of strategy suggests taking attested universals very strictly (i.e. as not parameterized) as they serve as boundary conditions on adequate descriptions of particular Gs. Thus, though PoS1 considerations don’t directly solve PoS2 problems, in particular contexts they suggest approaches to G variation that can circumvent PoS2 problems.

Last point: I’ve lamented the fact that we’ve stopped holding syntacticians’ feet to Plato’s Fire. We should constantly be asking of comparative syntax proposals what the acquisition scenario might be. We have, IMO, refrained from doing this of late (and I include myself here). I suspect that the reason for this is that we’ve all been seduced into doing languistics rather than linguistics.  We have stopped thinking of syntax as a method for investigating FL and have adopted the view that the ultimate goal of syntax is to explain syntactic patterns, rather than to use syntactic patterns to investigate the fine structure of FL. That’s unfortunate for many reasons, not the least of which is that it serves to Balkanize the discipline.  If syntacticans refuse to take responsibility for the cognitive relevance of their results, why should anyone else listen?

It’s not too late to change this. I again suggest that at every variation talk we ask how the proposed variation might be acquired. Syntacticians should be expected to have thought about this problem in developing their proposals. Maybe we should start asking syntacticians to specify what kind of data could account for the presented variation and whether this is plausibly available in the PLD the child might have access to. We now have quite a few Childes data sets and maybe we should start asking syntacticians to peek at these in making their proposals.  Having a workable solution is too high a bar. Having thought about the problem, considered the possibly relevant PLD, and entertained possible solutions is not. After all if a proposed account of a given variation is un-acquirable that is an excellent reason for thinking that the analysis is wrong.

[1] Note that even here, we do not address the specific LAD question but idealize to a situation where we aggregate Gs and reify them as languages. So nobody studies why/how Norbert acquires his idiosyncratic G but how a typical English speaker acquires GEnglish, an object that strictly speaking does not exist.
[2] By this I do not mean to imply that there is not good and sold work on this issue. I’ve discussed lots of this before. Berwick, Polinsky, Lidz, Yang, Guasti, Rizzi, Lightfoot, Roberts, Dresher, Fodor, Sakas and many others have addressed this question fruitfully. That said, I think we understand this issue less well than we do PoS1 concerns.
[3] Amusingly, the parameter theory is suggested in a footnote in Rizzi’s deservedly famous paper. The paper itself presented a different story. The parameter idea really took off with LGB, Rizzi’s discussion reworked in a systematic way that gave us the P&P architecture.
[4] I am reporting the history here. Grimshaw provided what to my mind was pretty compelling evidence that this was the wrong way to describe the data.
[5] If one assumes that English G is the unmarked case, then the investigation should concentrate on Italian PLD. The data required to fix CP as value are actually quite recondite, at least if eyeballed informally. Using standard Degree 0+ assumptions, violations of the WH-island constraint could not serve as PLD. So what might? Extraction from subject islands might (e.g. Of which Ferrari did the driver crash into the wall?) but I would bet that such data are few and far between in actual Italian PLD. Thus, the direct evidence for the CP/TP parameter are, I suspect, pretty rare in the PLD and so directly fixing the value of the parameter should be pretty challenging. At present, I have no idea how such a parameter might be fixed.
[6] Of course there is no English nor Italian. Even in these cases we idealize and don’t study particular individuals but study abstractions.
[7] I am using Anglicized Italian so excuse the accent.
[8] CFKPP discuss the that-t version of the FSC and understand the constraint in terms of adjacency. This may be right, but I doubt it. I suspect that what’s at stake is not adjacency but hierarchical proximity, inverted subjects being lower than Spec T. However, for what follows the details don’t matter much.
[9] There is a great paper testing Rizzi’s proposal in non-standard Italian dialects by Brandi and Cordin. It’s here. This really is a fun read and if you’ve never looked at it, you are in for a treat. The basic idea is that certain dialects can tell us overtly whether a WH is moving from Spec T or from a lower verbal position. In particular, movement from Spec TP is signaled with an obligatory subject clitic. Only if this clitic is absent is movement of a “subject” permissible. Take a look, it’s very pretty syntax.


  1. These are great points, to which I add a few observations.

    On my view, POS1 and POS2 are what LGB factors out into the core and the periphery. The core parametric system consists of the set of possible Gs (and thus ruling out the impossible Gs), and the mechanism of learning is parameter setting (more anon). The acquisition of the periphery deals with idiosyncrasies: exceptions, noise, nursery rhymes, and all the other messy bits of the primary linguistic data. This amounts to a garbage detector: the core system is not compromised (the child isn’t misled by noise, exceptions …) while lexicalized exceptions can be committed to memory accordingly. I have written a bit on the garbage detection problem in some of the earlier posts.

    On parameter setting (POS1): CFKPP’s challenge to theoretical syntacticians, which you (and I) endorse, is to go back to the golden age of GB. Even when I started, a lot of syntax papers in the canon (written in the 1980s) were raising, and tackling, these challenges all in one place: a parameter is proposed, and the author very often has at least an informal discussion of what kind of data would be sufficient to distinguish the target values. Informal, to be sure, never verified in child directed data, but it seemed like the goal of linguistic theory is very much tied to the problem of language acquisition and everyone was thinking about it in their day job. There is very little of that these days.

    The challenge can be met in small steps as well as long strides. For simple cases, it may be possible to work out what kind of data would support alternative parameter values, and we can then dive into the language specific data in CHILDES to start running correlations. But heroic projects like Sakas and Fodor are necessary for dealing intricately interacting parameters. There needs to be more of that. (Parameters and big data: sounds like a match made in heaven.) It’s a shame that it didn’t happen sooner, and the problem of acquisition (and POS) no longer seems to be at the forefront of syntactic theorizing.

  2. This comment has been removed by the author.

  3. I can't wait to read this paper, it sounds like it should be a good one ;) A few notes on your notes:

    Our corpus was constructed in such a way that we actually looked for CP/TP bounding phenomena in addition to that-t effects, and this too was not looking very encouraging. The focus on these kinds of extractions in recent years seems to have narrowed to explaining just variability in extractions out of subjects since Rizzi/Grimshaw, so I'm not even sure what the indirect learning story might be for wh-island (non)-violations, but I might just not have read the right papers. Given the data that's available to us at the moment, this seems to be in desperate need of an indirect learning story, insofar as the crosslinguistic variability claims are true. Actually, I suspect that most, if not all, the learnability conditions on apparent variability in islands will need to be considered very very carefully when framed this way, if they are robust across speakers.

    On your point about how homogenizing labels like "English" and "Spanish" are. The work by Han, Lidz and Musolino was really influential in how we approached thought about this issue. A very likely outcome might had been that English speakers really DID show rampant variation, precisely because it's hard for learners to correctly infer whether their language doesn't have the that-t effect (or conversely for Spanish speakers). If so, then we would have wanted to AVOID the Rizzian story. In other words, if it turns out that people exposed to similar PLD come to different conclusions about their grammar, then one wants a *less* learnable theory of the difference. In fact, this was one of the things that made us skeptical of the Sobin facts – if it were the case that Dialect A systematically had FSC effects, but not Dialect B, then there would need to be some PLD that distinguished Dialect A from Dialect B, and last I knew midwesterners didn't have postverbal subjects or rich agreement :)