Friday, September 2, 2016

Brains and syntax : part 1

William Matchin has been a post doc at UMD working with Ellen Lau. He is interested in how brains use grammars in real time. He has written a paper describing the frustrations that a neuroscientist has when approaching syntacticians for help. He has also provided some positive suggestions for how to move forward. I present his thoughts in two parts. First the problem as he sees it and, sometime over the weekend, I will post his suggested solutions. It's interesting stuff.


Linking syntactic theory to behavior and brains

It is very frustrating to work in the cognitive neuroscience of syntax within the mainstream Generative Grammar (GG) framework when there are essentially no real linking theories on offer between syntactic theory and online sentence processing. At present, the connection between syntactic theory and whatever people do when they hear and produce sentences is completely opaque, as is how the mature language system develops during acquisition. My present point is to underscore the essentiality of establishing such a linking theory. I truly believe that any cognitive neuroscience of language that seeks to incorporate the insights of Generative Grammar absolutely needs such a linking theory. Of course, cognitive neuroscience of language can proceed without incorporating the insights of syntactic theory, and this is often done – most people working on syntax attempt to localize some vague, a-theoretical notion of “syntactic processing” without clearly defining what this is. An even clearer example of departure from syntactic theory is recent work by that posits certain brain areas that are “core” language areas, without defining what language is beyond “you know it when you see it” (Fedorenko et al., 2011). Is that what we want for neuroscience investigations of language – near total disregard for GG? I don’t. The whole reason I am (was) here at UMD is because I find syntactic theory very deep, both for descriptive and explanatory adequacy, and I in fact think that the Minimalist program in particular may allow bridging the gap between linguistics and the fields of neuroscience and evolutionary biology.

There are reasons for this disregard, a major one being that nobody talks about how a Minimalist grammar is used. We certainly have plenty of insightful work in acquisition and psycholinguistics that tell us when children know certain grammatical constructions (e.g., Lukyanenko et al., 2014) or when certain grammatical constraints are used online (e.g., Phillips 2006), but we don’t have any strongly plausible suggestions as to what happens mechanistically. For example, it seems that people don’t search for gaps inside of islands, but why don’t they? How is the grammatical knowledge deployed in real time such that people don’t try to find a gap inside an island? This issue is a fundamental question for my line of work, and one that remains unanswered. For a related example in the world of brains, there is a very close connection between the syntactic properties of sentences and activation in language-relevant brain areas (e.g., Pallier et al., 2011; Matchin et al., 2014) – but what does this mean with respect to the function of these brain areas? Are these areas “Merge areas”, or something else? If something else, what is our theory of this something else (that takes into account the fact that this area cares about structure)? This sort of question applies to pretty much every finding of this sort in psycholinguistics, language acquisition, and cognitive neuroscience.

My work rests on experiments of language use, in normal people during brain scanning or in patients with brain damage. I attempt to explain why brain areas light up the way that they do when people are producing or comprehending language, or why patients have particular problems with language after damage to certain brain areas, and I try to connect these notions with syntactic theory. However, it is very hard to proceed without knowing how the postulates of syntactic theory relate to behavior. Here are just a sample of major questions in this regard that exemplify the opacity between syntactic theory and online processing:

·      When processing a sentence, do I expect Merge to be active? Or not?
·      What happens when people process things less than full sentences (like a little NP – “the dog”)? What is our theory of such situations?
·      Do derivations really proceed from the bottom up, or can they satisfactorily be switched to go top-down/left-right using something like Merge right (Phillips 1996)?
·      What happens mechanistically when people have to revise structure (e.g., after garden-pathing)?
·      Are there only lexical items and Merge? Or are there stored complex objects, like “treelets”, constructions, or phrase structure rules?
·      How does the syntactic system interact with working memory, a system that is critical for online sentence processing?

These things are not mentioned in syntactic theory because of the traditional performance/competence separation (Chomsky, 1965). There did use to be some discussion of these linking issues in work that sought to bridge the gap between syntactic theory and online sentence processing (e.g., Miller & Chomsky, 1963; Fodor et al., 1974; Berwick & Weinberg, 1983), but it does not seem so for currently, at least for Minimalism. In order for me to do anything at all reasonable in neuroscience with respect to syntax, I need to have at least a sketch of a theory that provides answers to these questions, and such a theory does not exist.

There are syntactic theories on the market that do connect (somewhat) more transparently to behavior than mainstream GG – those of the “lexicalist” variety (e.g., Bresnan, 2001; Vosse & Kempen, 2000; Frank, 2002; Joshi et al., 1975, Lewis & Vasishth, 2005), with the general virtues of this class of theory, including the very virtues of transparency to online behavior, summarized by Jackendoff (2002) and Culicover and Jackendoff (2005; 2006). In my mind, Jackendoff and Culicover are right on the point of transparency – this kind of grammatical theory does connect much better with what we know about behavior and aphasia. At the very least, it seems to me impossible to even get of the ground in discussions of psycholinguistics, neuroimaging or aphasia without postulating some kind of stored complex structures, “constructions” or “treelets”, or perhaps old-fashioned phrase structure rules that might fill an equivalent role to treelets (see Shota Momma’s 2016 doctoral dissertation for an excellent review of this evidence for psycholinguistics, hopefully available soon J). Minimalist grammars do not provide this level of representation, while lexicalist theories do.

Here is a set of fundamental observations or challenges from psycholinguistics and neurolinguistics that any kind of linking theory between syntax and online sentence processing should take into account:

·      Online processing is highly predictive and attempts to build dependencies actively (Omaki et al., 2015; Stowe, 1986 – filled-gap effect)
·      Online processing very tightly respects grammatical properties (e.g., phrase structure, Binding principles, Island structures) (Lewis & Phillips, 2015)
·      Stored structures of some kind seem necessary to capture many behavioral phenomena (Momma, 2016; see also Demberg & Keller, 2008)
·      The memory system involved in language appears to operate along the lines of a parallel, content-addressable memory system with an extremely limited focus of attention (Lewis et al., 2006)
·      The main brain “language areas” are highly sensitive to hierarchical structure and other grammatical properties (Embick et al., 2000; Musso et al., 2003; Pallier et al., 2011)
·      Damage restricted to Broca’s area (the main language-related brain region) results in only minor language impairments, not fundamental issues with language (Linebarger et al., 1983; Mohr et al., 1978)
·      Contra Grodzinsky (2000), there doesn’t seem to be any class of patients that is selectively impaired in a particular component of grammar (e.g., Wilson & Saygin, 2004), implying that core grammatical properties are not organized in a “syntacto-topic” fashion in the cortex
·      The neuroimaging profile of “language areas” indicates that while these areas are sensitive to grammatical properties, their functions are not tied to particular grammatical operations but rather with the processing ramifications of them (Rogalsky & Hickok, 2011; Stowe et al., 2005; Matchin et al., 2014; Santi & Grodzinsky, 2012; Santi et al., 2015)

As I explain in more detail later in this post, a language faculty that makes prominent use of stored linguistic structures and a memory retrieval system operating over them allows us to make coherent sense out of these kinds of findings.

At any rate, it seems painfully true to me that the ‘syntacto-topic conjecture’ (Grodzinsky, 2006; Grodzinsky & Friederici, 2006), the attempt to neurally localize the modules of syntactic theories of GG (e.g., Move alpha, Binding principles, fundamental syntactic operations, etc.), has completely failed. Let me underscore that – there are no big chunks of dedicated “grammatical” cortex to be found in the brain, if grammatical is to be defined in these sorts of categories. At any rate, did we want to abandon the Minimalist program and re-adopt Government and Binding (GB) theory in order to localize its syntactic modules? One of the virtues of the Minimalist program, in my view, is that it seeks a more fundamental explanation for the theoretical postulates of GB (Norbert’s numerous blog discussions of this issue), which we didn’t want as the primitive foundations of language for reasons such as Darwin’s problem – the problem of how language emerged in the species during evolution. Incidentally, the lack of correspondence between GB and the brain is another reason to pursue something like the Minimalist program, which possesses a much slimmer grammatical processing profile that wouldn’t necessarily take up a huge swath of cortex. Positing a rich lexicon with a slim syntactic operation seems to me to be a very plausible way to connect up with what we know about the brain.

Except for the fact that I know of no linking theory between grammar and behavior for a Minimalist grammar aside from Phillips (1996). And even that linking theory really only addresses one issue listed above, the issue of derivational order – it did not answer a whole host of questions concerning the system writ large. Namely, it did not provide what I believe to be the critical level of representation for online sentence processing – stored structures. So I have no way of explaining the results of neuroimaging and neuropsychology experiments in Minimalist terms, meaning that the only options are: (1) adopt a lexicalist grammatical theory a la Culicover & Jackendoff (2005; 2006) and eschew many of the insights of modern generative grammar (2) develop a satisfactory linking theory for Minimalism (which I argue should incorporate stored structures, etc.).

It may be the case that syntacticians don’t care, because these concerns are not relevant to providing a theory of language as they’ve defined it. And I actually agree with this point – I don’t necessarily think that syntacticians ought to change the way they do business to accommodate these concerns. Here I strongly disagree with Jackendoff – I think there are good reasons to maintain the competence/performance distinction in pursuing a good description of the knowledge of language, because it allows them to focus and develop theories, whereas I think if they were to start incorporating all of this then they’d be paralyzed, because there is just too much going on in the world of behavior (modulo experimental syntax approaches, e.g. Sprouse, 2015, that are well-targeted within the domains of syntactic theory). I do recall a talk by Chomsky at CUNY 2012 where he pretty much said “hey guys stop looking at behavior there’s too much going on and you’ll get confused and go nowhere” – it doesn’t seem like patently bad advice.

However, I don’t think our studies of behavior and the brain have done nothing, and maybe it’s the right time for syntacticians, psycholinguists, and neuroscientists to start connecting everything up together.

There are also some very important reasons to care. If you want your theories to be taken seriously by psycholinguists and neurolinguists, you need to give them plausible ways in which your theoretical postulates can be used during online processing. The days of Friederici and Grodzinsky are numbered – Grodzinsky has already pretty much renounced the syntacto-topic conjecture (finally – Santi et al., 2015), and Friederici clings to what I think is a very hopeless position regarding Broca’s area and Merge (Zaccarella & Friederici, 2015). These were the only people that seriously engage with syntactic theory in generative grammar who have any clout in cognitive neuroscience. Everyone else seems to be pretty much ignoring mainstream Generative Grammar. Is that what we want? I can imagine that this sort of stuff is important for intra- and inter-departmental collaboration, funding, etc.

There seems to be a decline in the purchase of generative grammar in the scientific community, which may only hasten with time and the eventual death of Chomsky. A good way to forestall or reverse this is by opening up a channel of communication with psychologists and neuroscientists through these specific linking theories (at least a sketch of one), not merely the promise of some possible linking theories (which appears to be what Norbert is telling me in our conversations). We need to actually make at least a rough sketch of a real linking theory in order to get this enterprise off the ground.

Secondly, it might be the case that introducing a plausible linking theory has ramifications for how you think about language and syntactic theory. There could be some very useful insights into syntactic theory to be gained once greater channels of communication are open and running.

I am worried that I cannot successfully combine my work in neuroscience with syntactic theory and generative grammar. The conference that I attend every year, the “Society for the Neurobiology of Language”, ought to be the most sympathetic place you’d find for GG to talk to neurobiology. In reality, there is hardly ever a peep about GG at these conferences. This ought to be a very disturbing state of affairs for you – it certainly is to me. The sometimes latent, sometimes explicit message that I keep receiving from my field is to stop caring about GG because it bears no relation to what we do. I reject this message, but in order to do meaningful work, I need to be armed with a good (sketch of a) linking theory. A big goal of this post is to solicit reactions and suggestions from syntacticians in developing this theory.


  1. I think a large portion of these problems stems from people conflating listedness and storedness. When generative syntacticians talk about the lexicon, they are talking about what needs to be listed (i.e., which units have properties that cannot be computed on the basis of the properties of their parts). I don't think most psychologists who (think they) are working on linguistic capacities use the term that way, though; they simply use it to refer to long-term memory of language-related things. Bizarrely (since they both should know better), Jackendoff and Culicover are among those caught in this confusion.

    To see how deep this goes, the whole bottom-up structure building issue might very well be a red herring if we view it through the listedness vs. storedness prism. It may very well be that language production as well as processing deploy precompiled(=stored) units in a left-to-right/top-to-bottom fashion – and meanwhile, those units are compiled in the first place (from listed atoms) only if they fit correctly into the bottom-up schema that "competence" people have developed. (A useful if imperfect analogy is my brain storing the output of long multiplication for pairs of operands that I have multiplied often enough, and only if that <operand1, operand2, product> triad respects the "competence" rules for multiplication in the first place.)

    This way of looking at things admittedly drives a deep, sharp wedge between competence and performance, and thus between theories of grammar and theories of grammar's use. But just because something is unfortunate – scientifically, and maybe even from a politics/funding perspective – doesn't mean it's not the truth.

    1. Would the above be at least approximately another way of saying that one of the things the grammar does is optimize the storage of the internally complex remembered chunks?

    2. I don't think so, no. Quite the opposite: the question of which precompiled units are and aren't in storage(=long term memory of linguistic objects) at a given moment depends on things like, e.g., what you have said/heard lately. (Where "lately" presumably stands for some complicated decay function.) This is exactly the kind of thing we don't want the grammar to be in charge of, since the grammar should not be a theory of, e.g., which TV shows one watches most often, etc.

    3. Perhaps I should have been clearer; by 'optimize' I mean 'reduce the amount of information needed to specify', so that for example you have a stored verb+object combination, and specify it in terms of a combination of such and such a predicate with a complement in a suitable abstract relation, you don't have to specify word order and case-marking information in languages where those are relevant.

      A corner that I don't think the inventory people have poked into is what happens with collocations in languages with more flexible word order than English, such as almost all Euro languages with V2-type phenomena ... I have checked out one subject-verb idiom in Icelandic, 'skóinn kreppir' 'the shoe pinches' ie things are uncomfortable/difficult, which indeed happily obeys V2 as in 'nú kreppir skóinn' 'now the shoe pinches', but I don't think anybody has any idea how the evidence on processing speed etc applies to the word-order variants of these kinds of idiomatic chunks, let along the compositionally regular but arguably stored collocations.

    4. Thanks Omer - what you're saying makes sense to me, and agrees quite a bit with what Norbert has been talking to me about. I suppose an even more fundamental syntactic theory than Minimalism would propose an operation that combines features to create the lexicon, do you think that's right?

    5. @Avery: In that case, yes, I'm certainly on board. Sorry for misunderstanding earlier.

      @William: I haven't thought about that, to be honest. There is something along those lines in the Nanosyntax camp, who take features to be the atoms of syntax (not exactly what you were saying, I know). This then forces them to adopt a "distributed" lexicon (in the sense of Distributed Morphology), where idioms are not part of the syntactic lexicon but a non-composition "spellout" rule for a chunk of structure at the LF interface.

      Let me add (not as a direct response to Avery or William, but inspired by their comments) that even if true, the view I was espousing leaves much to be desired. E.g. it remains a mystery, on such a view, why incremental parsing would respect islands (to take one of many examples showing that parsing pays attention to factors we're used to thinking of as "grammatical"). I'm not sure the idiom thing is as much of a puzzle, though. Let's assume that the heuristic that assembles these stored treelets in a left-to-right fashion works along the lines proposed by Phillips (1996), and with that, has some approximation of "movement" from A-bar positions to A-positions to theta positions. I want to stress that this is not reconstructing all of syntax in the parser; I'm quite convinced that there are grammatical phenomena which cannot be "inverted" in this fashion, from bottom-up to top-down (PCC effects, omnivorous agreement effects). It's more like – to continue with the earlier imperfect analogy – the fact that I have built in shortcuts when multiplying multiples of 5, and those heuristics are valid in virtue of the slice of the competence that they mimic.

      More generally, my point was not that I have a worked out theory of how the parser uses precompiled treelets in real time, whose structure is licensed by the competence system, in such a way that gives rise to many effects we associate with the competence system itself. (Wouldn't that be something??) Not even close. My point was that, in principle, it might work this way, in which case anyone whose methodology interfaces mostly with the store (rather than lexicon + syntax) is at a degree of remove from grammar. And that, unfortunate as that may be, I'm not convinced that this is not the world we happen to inhabit.

    6. I agree with Omer's sketch of a view above. In fact, I've said as much in my Variation and Grammatical Architecture paper though that was focussed on the use of stored structures in situations of sociolinguistic variation. You can have something like construction grammar in reverse as a theory of use, but you need generative grammar as a theory of how to generate the structures that get routinised. 'Words' I think are a kind of privileged storage device for routinising and accessing complex syntactic structures.

    7. All right - so it seems like every syntactician I have encountered has endorsed the use of treelets during sentence processing. This is helpful to know. And I definitely agree that we need a generative grammar as a theory of where the treelets come from - this is what I found compelling about Frank's (2002) approach to these issues.

    8. Maybe we need to be more explicit about distinguishing theoretical syntax from what we could call applied syntax. The former being about the governing laws for syntactic structures and the latter being about how those structures are routinised and deployed (not just neuro, but psycho, processing, socio, etc). I think where construction grammar goes wrong is that it takes routinisation to be the start point. But routinisation of what? Linear strings? That just fails to answer any of the major questions about the human grammatical capacity (which is hencecdenied). But if we can clarify that we're not rejecting routinisation, storage, incrementality etc as components of a theory of use, then perhaps psych/neuro/socio people will be more inclined to listen to what we have to say about the laws. That's been my experience on the socio side for the last decade or so.

    9. Well maybe everybody should go off and do some work on making their favorite syntactic theory 'treelet-friendly' ('chunkable'?) ... TAG comes that way out of the box, but I'm intuitively sure that it's perfectly doable for all the others. Some parts of a possible proposal for LFG appear in my LFG 2008 conference paper, but certainly not the full job.

    10. @David

      I think it's helpful to not reject storage, incrementality etc., but even more helpful to advocate for it. It has certainly been confusing to hear generative grammarians express admiration for Friederici's work which appears to rest on a simple, major confusion of this point.

  2. A big oversight of mine in this post was that Townsend & Bever (2001) do propose an explicit linking theory between syntax and sentence comprehension. They posit that we heuristically understand a sentence, and then go back through it a second time with an explicit syntactic derivation to confirm the interpretation (an analysis-by-synthesis approach). Their slogan is "we understand everything twice". I think Phillips (2012) clearly points out the flaws in this proposal (namely, that we don't understand everything twice), but Townsend & Bever had the virtue of making an explicit linking theory.

  3. The post refers to the imminent availability of Shota Momma's (very interesting) 2016 PhD thesis. It's already available, as it happens:

  4. William: I'm unsure of why you're seeing a fundamental challenge here.

    "Representation" and "memory" are basically names for the same thing. Similarly, operations that merge or otherwise connect two smaller representations entail memory access operations. We just tend to use different terms when operating at different levels of analysis.

    Everybody is committed to having stuff stored in long-term memory. For some, the pieces are bigger, for others the pieces are smaller, but you've gotta have something. You seem to assume that the stored stuff comes in big pieces, and that this creates a challenge for linking theories, but I don't see the evidence or the challenge as so clear cut.

    As for how we make use of grammatical knowledge in real time, we know way more about this than we did 20 years ago. And the evidence is pretty consistent: it's hard to find good evidence of delay in access to any grammatical knowledge. (For sure, sometimes things don't work out perfectly, but that's another story.)

    The key feature that distinguishes comprehension and production tasks from what most traditional grammatical work is uncertainty. Comprehension is the task of constructing a sound-meaning pairing with only the sound provided as guidance. Production is the task of constructing a sound-meaning pairing with only the meaning provided as guidance. The uncertainty inherent in these tasks brings challenges that are less of a concern for John or Jane Grammarian. But again I see no conflict.

    I don't see clear evidence that such-and-such grammatical theory is better suited to capturing real-time processes. Whenever we have dug into such claims, we've generally found them to be not so persuasive.

    I think that a lot of misunderstanding arises due to conflation of three different distinctions: (i) levels of analysis, (ii) tasks, and (iii) components of a cognitive architecture. Discussions of competence and performance almost always exacerbate things, because the terms mean different things in different contexts. And there is certainly plenty of mistrust between subfields. But from where I sit, I don't see a fundamental challenge. And among the growing community of researchers who are equally at home with linguistic flora and fauna and the ins-and-outs of parsing/production and memory models, I think the feeling is similar. There are lots of interesting things to work on, but we don't see a fundamental challenge.

    1. @ Colin

      "As for how we make use of grammatical knowledge in real time, we know way more about this than we did 20 years ago. And the evidence is pretty consistent: it's hard to find good evidence of delay in access to any grammatical knowledge. (For sure, sometimes things don't work out perfectly, but that's another story.)"

      I definitely agree with this statement, as I have been quite persuaded by the careful research program you and others have pursued on this point. I do have a major caveat, though, which is constantly my confusion whenever I listen to a talk on this topic: what do you mean by "grammatical knowledge"? This is a general and important issue in my opinion - it is hard to talk about these things in the abstract without specific cards on the table as to what grammatical knowledge constitutes. I liked your dissertation because it was clear on the grammatical model, although that model does not seem adequate to account for many of the psycholinguistic findings on prediction, structural priming, etc., regardless of whether it retained the empirical and conceptual insights of bottom-up Minimalism.

      So whether or not I am posing real "challenges", I really only care with respect to what a reasonable picture of the language faculty looks like, so that I can try and figure out how the different components of FL could be implemented in a brain and then test these hypotheses. So would you endorse a core UG that creates syntactic objects bottom up, which can then be stored in long-term memory and retrieved online for sentence production and comprehension? Or do you think there is a better model of FL out there?

    2. @ Colin #2

      "I don't see clear evidence that such-and-such grammatical theory is better suited to capturing real-time processes. Whenever we have dug into such claims, we've generally found them to be not so persuasive. "

      I would love to see/hear the specifics of this! To me, a minimalist grammar is worlds apart from a TAG grammar. A TAG grammar has no specification for how the treelets got there in the first place. A minimalist grammar has no representations or devices that provide for predictive processing, structural priming, etc, regardless of the directionality issue. This is why I really liked Frank's 2002 proposal, which acknowledges that TAG needs a theory of where the treelets come from, and incorporates a minimalist syntax as a theory of where the treelets come from.

    3. @ William #1

      Yes, in 1996 I was a syntactician who was dabbling in parsing work, and had little idea of computational models, memory, prediction, illusions, etc. etc. Many of the things that we worry about nowadays weren't even on my radar back then.

      Grammatical model for real-time computation. Choose your favorite formalism, and then put the pieces together in the order that people do so in real time. It should be possible with minor tweaks in most formalisms. Minimalists tend not to do so, but it should be fine to merge in a non bottom-up fashion if they're so inclined. (Shevaun Lewis and I discussed this in a 2013 paper; different than the 2015 paper that you mention.)

      Prediction. I don't see this as a challenge for the linking theory. It's just another way of saying that the construction of the mental representation can get ahead of the words in the input in comprehension.

      Priming. For structural units to be primed they need to be stored. There are claims that the extant evidence favors specific types of syntactic models. Phoebe Gaston, Nick Huang and I looked into this a little this summer. Our impression is that the existing evidence is interesting, but compatible with a wide range of grammatical models.

      Perhaps where I'm losing the thread is on the excitement about treelets. I'm assuming that (i) everybody has some way of encoding what their minimal units combine with; (ii) treelets are simply one notation for encoding these combinatorics; (iii) it's an independent question whether the treelets are bigger or smaller; (iv) the evidence on whether people store and use larger treelets is not clear cut; and (v) it's no big deal if people store some often used combinations.

      @ William #2

      We address the (in)decisiveness of different types of psycholinguistic evidence for grammatical models in Phillips & Wagers 2007 (for wh-movement) and Phillips & Parker 2014 (for ellipsis).

      A minimalist grammar has no need to say anything special about predictive parsing. Prediction is simply structure building that doesn't wait for phonological support. Priming does diagnose storage, but the evidence that the stored units are big is not so clear.

    4. @ Colin

      "Prediction. I don't see this as a challenge for the linking theory. It's just another way of saying that the construction of the mental representation can get ahead of the words in the input in comprehension. "

      "A minimalist grammar has no need to say anything special about predictive parsing. Prediction is simply structure building that doesn't wait for phonological support."

      I disagree. The prediction is always directed, is it not? I.e., what are you predicting? And it seems as though people are often predicting the existence of abstract structural nodes, traces (or formal equivalent), etc. I don't see how this could be encoded in a minimalist grammar without adding some kind of device that indicates what is to be predicted, i.e., some kind of treelet or phrase structure rule. The minimalist grammar on its own is therefore not enough. Once you allow the widespread use of phrase structure rules or treelets, though, then it seems to me that caring whether the syntactic derivation is bottom-up or top-down seems irrelevant. This is because the system is making widespread use of stored treelets in processing anyway, so why does it matter whether those treelets were constructed bottom-up or top-down anyway?

      I don't think the treelets have to be huge to pose this central question - as long as you have something like S -> NP VP and its variants, then I think the picture I have described follows naturally.

  5. This comment has been removed by the author.

  6. You might be interested in the work of John Hale, who has formulated explicit linking theories between minimalist grammars and behaviour (suprisal; entropy reduction). He has been working recently to relate this to fMRI data.

    I have also proposed a memory based linking theory (adapted from Joshi's TAG based one and related to Gibson's DLT), the logical space surrounding which has been rigorously explored by Thomas Graf.

    This is all predicated on the happy fact that there exist correct parsing algorithms for minimalist grammars, as first set out by Henk Harkema in his UCLA PhD thesis, where he shows that bottom-up and (predictive) top-down can be formulated in this context.

    Recently Stabler has been exploring the merits of a probabilistic version of Henk's Top Down parser.

    Makoto Kanazawa has shown that a very general construction gives predictive Earley parsers for essentially all restrictive grammar formalisms.

  7. William - thanks for the provocative thoughts!

    I totally agree with the challenge to specifying linking hypotheses, and also with the general sense of "the room" as to the importance of chunking. I think things might be less bleak than it looks at first glance(??) We (me, John Hale, Christophe Pallier etc.) just began an NSF-funded project to push forward on the size (and kind) of these chunks (#1607251, #1607441). We are looking both at some of the standard NLP approaches vis a vis establishing a cutoff for word-sequences and also at the view whats get chunked are parser operations (following Newell et al. on SOAR; applied to parsing in Lewis' 1993 dissertation and Hale's 2014 book). We test a few different grammars for defining these chunks (MG, TAG etc.) in the same project. Upshot: I think there are some promising linking hypotheses in the computational literature, but they haven't been applied to neuro data yet!

    I disagree that there is "no linking theory between grammar and behavior for a Minimalist grammar aside from Phillips (1996)." Stabler, Hale, Kobele, Hunter and others have been pushing forward on family of automata for parsing MGs. These can be as predictive as your sentence-processing theory demands (i.e. the span the GLC lattice). The bigger point goes back to Stabler's 1991 "Avoid the Pedestrian's Paradox" paper. That paper really nicely demonstrates that the "directionality" of the grammar and the directionality and/or predictiveness of the parser are totally orthogonal. We take a stab at applying these tools to query fMRI data in a recent paper (doi: 10.1016/j.bandl.2016.04.008)

    Finally - can you embed links in these comments??

    1. OK so Greg scooped most of what I wanted to say while I futzed around with the (very small) comment box.

    2. @ Greg & Jon

      Thanks for the thoughts and references. I have dabbled a bit into this computational parsing literature, which is quite difficult for me to wrap my head around, but my general impression is that this literature does not make clear ontological claims about the faculty of language in light of syntactic theory and psycholinguistics. In other words, what does this literature say are the components of sentence processing that incorporate syntactic theory? This is what I mean by a linking hypothesis (as in Phillips 1996 and Townsend & Bever 2001).

      Is it the case that this computational work makes these kinds of ontological claims? If so, is there a good summary of them? I spent much of this morning reading through Harkema (2001) and could not come away with clear claims of this sort.

    3. I think your question aligns with the target of John's 2014 book. which asks in what ways parsing automata might serve as theories of syntactic processing and develops a set of linking hypotheses (node counts, search path, surprisal, entropy reduction) to connect parser states with measures of processing... i.e. linking hypotheses.

    4. So the parsing automata are the theory of what constitutes sentence processing? What is the theory of these automata? Where and why do they get their properties? Are these things in the Hale 2014 book?

      Perhaps your notion of linking theory is the normal one - a link between parser states and behavioral measures. So this is my fault for not making my intentions perfectly clear. What I am interested in is a theory of what constitutes FL, and hopefully one that aspires to explanatory adequacy - why does FL have the properties that it does?

      My sketch of a theory is that there is a primitive lexicon, a structure-generating operation that creates treelets out of this lexicon, and a memory system that retrieves and manipulates treelets. Is there a comparable theory or sketch of one from the literature you describe?

    5. An automata is a machine that recognizes whether a string conforms to a grammar. Paired with machines for interpretation (e.g. matches grammar-units with terms in a model) etc. you can approach a computational theory of sentence comprehension. These are covered in standard textbook intros (and John's book). They might live on the "memory system" part of your sketch in as much as they describe procedures by which memory retrieval and encoding operations are simultaneously conditioned by stimulus input and top-down knowledge (grammatical and lexical).

      Do you lump processing operations in with FL? Or, is FL strictly a competence theory? Either way, I think there are at least two levels of linking hypotheses in play. This is classical Marr: some algorithm manipulates linguistic representations that are defined by the grammar+lexicon, and some brain-stuff implements that algorithm. Automata theories offers a linking hypothesis of the first kind (which take the shape of: given grammar X and input Y, the current parser state is Z). We're now working on testing linking hypotheses of the second kind (which take the shape of, e.g., brain signal X should vary in proportion to F(A,B) where A,B are successive parser states.)

      Aside: I think that "structure-building" can be a bit ambiguous in this context, as it gets used by by syntacticians to describe rules for determining well-formedness, and psycholinguists to describe the process for recognizing/interpreting some input. I suppose, following the comments above, that "Lexicon" can be ambiguous in a similar way: as the list of undecomposable arbitrary pairs (aka DM's encyclopedia??), or as a memory store that includes chunks (however we define those.)

    6. @William: I'm not sure what you mean by "components of sentence processing that incorporate syntactic theory".

      In the computer science literature, the grammar is a specification that the parser satisfies. The grammar specifies which sounds go with which structures/meanings, and the parser is an algorithm that maps sounds to the structures/meanings the grammar says they should have.

      Usually, as no one wants to write ad hoc parsing algorithms for each individual grammar, there is a general procedure to construct a parser given a grammar, which then makes regular use of the mechanisms offered by the individual grammar. (Most often this satisfies things linguists call 'strict competence hypotheses'.) Hale's parsing automata are a traditional way of implementing one of these general procedures to convert grammars to parsers.

      This is perhaps more familiar to you in the guise of David Marr's (or Poggio's or Pylyshyn's or ...) levels interpretation of grammars and parsers, which I would characterize glibly as 'the grammar is (a high level description of) the parser'.

      Anderson's program of 'rational analysis' seems like it might be related to your goals of understanding 'explanatory adequacy'.

      > Is there a comparable theory or sketch of one from the literature you
      > describe?
      Yes. The same parsing algorithm can be presented in many ways, each of which can be useful in its own way. Parsing schemata (used by Henk) present parsers in terms of inference rules, which emphasize the high-level structure generating operations, while being agnostic about the organization of the memory system. Automata (as used by John) present a lower level picture of the same algorithm, which emphasize the transitions between parser states.

      I think taking a look at Stabler's papers the epicenter of linguistic behaviour, and then Memoization in top-down parsing would be useful for a very concrete philosophical grounding of these questions. Then his very detailed Two models paper presents a minimalist parsing algorithm in a very helpful way.

    7. @ Greg & Jon

      I will try to spend more time looking at this literature. Thanks again for the additional references.

      "In the computer science literature, the grammar is a specification that the parser satisfies." "This is classical Marr: some algorithm manipulates linguistic representations that are defined by the grammar+lexicon, and some brain-stuff implements that algorithm"

      I believe I understand the philosophy of this approach. This approach makes a hard Marrian division between grammar as computational-level description and parser as algorithmic-level description of language. If I were to rephrase things a bit, the grammar isn't "real" - the parser is real, and the grammar is just a static way of describing what the parser does. If that fair to say?

      I believe this is a somewhat common perspective of syntactic theory, but I don't think it's how many syntacticians would look at it (I'm sure this ground has been much tread before). At least when I read Chomsky, I don't get this hard Marrian division (aside from the core texts, in at least in one interview he has highlighted the difficulties of applying Marr to syntactic theory). What I get is that the grammar is simply one piece of FL among many others, parts of which may have to be re-analyzed because other parts of FL (such as the memory system) might account for some of the same phenomena. So the grammar is "real" - the question for those looking to understand its role in sentence processing is to determine what time scale it operates under, and how much of it needs to be re-analyzed in certain ways.

      I think Phillips (1996) and Lewis & Phillips (2015) are persuasive in looking at the grammar as a real component of FL rather than simply a static description of what representations FL builds. I disagree with Colin about the right way to incorporate the grammar (as identified by e.g. Chomsky 1995) into a larger theory of FL - he would like to see it as a real-time system of the adult FL, and I would rather see it as only being seriously operative during language acquisition.

      Would you say that this perspective is incompatible with the one espoused in the literature you are citing?

    8. @William: It is not incompatible in principle, but the other perspective is slightly bizarre if you try to make sense of it in these terms.

      I believe that Chomsky's work is compatible with either perspective, and that Fodor was the one who really pushed the perspective you are talking about. Norbert posted a month or two ago about him adopting Marr's perspective (at least that's how I understood it).

      My favorite interpretation of this perspective was given by Thomas Graf in a comment on this blog somewhere, and allows us to view it as a notational variant of what is espoused in the literature I am citing.

      My least favorite interpretation is what it sounds like most people intend, and I outline this below.

      There is a parser P. It implements a specification which is different from the grammar (think Bever's NVN). Let's call this the parser's grammar, PG. (Unfortunately in this worldview, no one studies PG, so we don't know what it is like.) PG, qua P, is causally implicated in a host of behaviour.

      There is also a grammar, G. The parser P may 'consult' G on occasion. It is not clear what consulting in this sense is. (Graf's interpretation is that consulting is just checking to see whether a particular lexical item is present). G is causally implicated in off-line acceptability judgments. It is completely mysterious how though, as PG governs the structures assigned to parsed strings, and these could be vastly different from what G would say about them.

      From the perspective of someone who takes this view seriously, I am saying that G is actually PG. I want to say this, because
      1. it works
      2. it takes syntax seriously
      3. it is simple
      4. there has never been any reason given not to

    9. @ Greg

      I agree with your criticism of the not good stuff, and I agree that it isn't good. The proposal that I'm making is that the parser uses objects directly created by the grammar, but that the grammar directly created these objects during the person's language development. Once these objects were created, the grammar kicks its feet up for the most part with respect to sentence processing. The parser, consisting of memory operations and treelets, does most of the work of online sentence processing.

      I don't think what I'm saying is much different from what you're saying, in fact. The grammar IS the specification of the parser, because it creates the objects that the parser uses. Rather than the grammar just being another way to describe the parser in static terms, the grammar is actually a real time processing device that is more active during language acquisition (and perhaps of internal thought) than in real-time adult sentence processing.

      From Aspects, chapter 1:

      "Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech-community, who knows its language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance. ...

      To study actual linguistic performance, we must consider the interaction of a variety of factors, of which the underlying competence of the speaker-hearer is only one. In this respect, study of language is no different from empirical investigation of other complex phenomena." [emphasis mine]

      I think the outline is quite clear, and the correct perspective on the theory of grammar. I think early syntactic theories assumed the perspective that Colin takes - the grammatical theory is tightly related to what happens during real time during adult sentence processing. I am only making a slight adjustment - syntactic operations create the objects of sentence processing in real-time, but once these objects are created, the grammar doesn't need to re-create them every single time the adult speaker-hearer produces or understands a sentence.

    10. @Greg. I share your view on the view that you're not sympathetic to. In your list of reasons to assume G is PG I would disagree with #4 ("there has never been any reason given not to"). I think that various people going back decades have believed that there are reasons to put more daylight between G and PG. I just don't think that the argument stand up to closer scrutiny.

      @William. I now think I see what you're arguing for. But what's the motivation for the additional stuff? The gap between the grammar and the stuff that you use in real time comes at a great cost, and with unclear empirical motivation. I think I'm not seeing the payoff.

      (Note that we should beware of being comprehension-centric. Comprehension is just one of the tasks that people can carry about in real time using their structure building ability.)

    11. @ Colin

      By additional stuff, you mean stored structures (and whatever is necessary to modify the stored structures appropriately)? The motivation is empirical - it just seems to be true that you can't get off the ground in describing parsing and production adequately without using treelets or their equivalent (I think you acknowledge this is true).

      Once you allow the existence of some treelets, the parsimony bridge has been crossed, and I see no reason to limit the use of treelets at all. And it does seem extremely inefficient if the grammar has to reconstitute the constituent structure of sentences every single time you produce/comprehend a sentence.

  8. @William: I may have missed this, but what do you see as the main processing evidence for the necessity of having treelets in the grammar? I know you linked to Demberg & Keller and Momma, but perhaps you could summarize what you have in mind?

    1. Momma argues that the units of structure building (in production) are actually quite small, contra claims in the production literature that large pre-formed templates are retrieved. His argument is based on the time course of syntactic priming effects: facilitation due to priming is localized to the primed constituent. He uses the term "treelets" just as a way of referring to little pieces of structure, not presuming that they are larger assemblies of smaller pieces.

    2. @ Tal

      One of the main reasons in which the existence of treelets is motivated is the fact that sentence processing is predictive in a number of ways: prediction of syntactic categories, gap/trace locations, agreement features, island structures, etc. A natural vehicle to encode these predictions are treelets or equivalent device - by retrieving the treelet and integrating it into the current structure, you have your prediction of whatever information was on the treelet.

      Another piece of evidence is structural priming and related phenomena (one type of related phenomenon is described in Lau et al., 2006 - there seems to be a prediction of an ellipsis configuration, or rather the expectation of ellipsis changes the syntactic information that is predicted). An obvious way to encode information about specific constructions is on a representation that embodies that construction.

      In addition, there are other reasons to motivate treelets. Idioms are one example - I don't think there is any good way to talk about idioms without positing the long-term storage of a complex object. I have never looked into Construction Grammar, but I have a feeling that they like constructions for some reason - treelets are a way of capturing whatever the phenomena are that motivate talking about constructions.

      Finally, I just don't see any conceptual arguments against treelets. In a Minimalist grammar, as Omer commented at the top, the goal is to reduce everything down as much as possible to what needs to be listed. This does not in the slightest preclude long-term storage. In fact, I feel that precluding storage of structures is a stipulation that actually simplifies matters when removed.

    3. In an old version of Shota's dissertation, he motivated the limitation in size of treelets by a memory constraint. That is, he suggested that treelets should minimize similarity-based interference by not having multiple structural nodes of the same syntactic category - don't predict two NPs, but NP and VP are fine, because you won't be confused in where to insert the retrieved lexical item. I think this was an excellent suggestion, although at first glance I could not find his proposal in the published dissertation!

      The important point in this proposal is that the grain size of treelets is not limited by the grammar but by memory limitations. This is in accordance with my proposal, which says that the form of treelets is determined by efficiency of memory/processing.

    4. Thanks! I think that there are two types of arguments here:

      1. Linguistic arguments for constructions / treelets, mostly from noncompositional meaning or idiosyncratic syntax (the work that's associated with Fillmore, Jackendoff, Goldberg, and others). Those are not directly related to processing. I haven't read the Construction Grammar literature in a while, but I remember I found the arguments interesting (though I wouldn't be surprised if there are Minimalist treatments of let alone, the more... the more and the other celebrated examples).

      2. Processing phenomena that could be produced by a memoization layer on top of a treelet-free grammar (as I think was mentioned before) - i.e., the treelet-free grammar produces stuff, then "performance" components can decide which frequently used treelets to store, regardless of compositionality (e.g., to minimize tree-construction effort). Omer, David and others don't seem to object to this type of storage of large units.

      So while the linguistic evidence may be convincing, I'm not sure we have processing evidence either way (I agree with you that it's not clear why the burden of proof should be on the pro-treelet-in-grammar rather than the anti-treelet-in-grammar crowd). Perhaps the argument for treelets in the grammar would be more compelling if we had a correlation between the linguistic and psycholinguistic arguments, where for example tree fragments that have some noncompositional meaning show priming effects but equally frequent but unremarkable tree fragments do not show those effects?

    5. @ Tal

      "Processing phenomena that could be produced by a memoization layer on top of a treelet-free grammar (as I think was mentioned before) - i.e., the treelet-free grammar produces stuff, then "performance" components can decide which frequently used treelets to store, regardless of compositionality (e.g., to minimize tree-construction effort)."

      I would like to think that I am advocating for something quite similar to this. However, I do think once we specify the nature of interaction between grammatical objects and the memory/attention system, things traditionally analyzed as grammatical might fall out of the interaction of treelet formation and memory operations. I have a hunch that this is the case for (at least some) syntactic islands.

    6. Just to clarify something: my view of (this thing we have been calling) treelets is just long-term memory where the remembered object is one that happened to be constructed by the linguistic system. It's logically possible that our long-term memory is incapable of storing linguistically complex objects except in those instances where these objects have properties that aren't computable from properties of their parts (i.e., in those instances where a linguist would posit that the object in question is listed). But that strikes me as a conceptually odd position to take, and an empirically dubious one.

      To echo Marantz, consider "any friend of yours is a friend of mine." I remember that I have heard this expression before. What is it that I am remembering when I remember this? Is it a string of phonemes? A string of graphemes? I think it is eminently reasonable to say that what I remember is an object constructed by the linguistic system. Now, only someone who fundamentally misunderstands the point of generative grammar would place this expression in the lexicon. (I'm assuming here that this expression has no properties that don't come from properties of its parts except that I remember it; if that's not the case for this expression, then just pick a different, more appropriate example.)

      We can then ask separate questions like, "Is the lexicon neuro-physiologically real, or is it just the subset of our long-term memory of linguistic objects for which the stored object has non-predictable properties?" That's a valid question, I think. But even if the answer turns out to be "no, it's just the subset of ...", the distinction remains important from a theoretical perspective. That's because the goal of the whole endeavor is to characterize the generative property: our ability to assign structure and interpretation to utterances we have never encountered before. The lexicon is the minimal amount of long-term storage you need to have to accomplish this task. And I would perform the same (modulo reaction times / priming / etc.) on assigning structure & interpretation to "any friend of yours is a friend of mine" whether I remembered that I had encountered it before or not.

    7. @Omer

      Thanks! This is a much better way of stating what I was trying to get at early re the potential ambiguity of "Lexicon"

    8. @Omer: I am wondering if there is a problem in saying that the lexicon is just a subset of long-term memory. It seems to me that saying so might (?) force one to commit to a certain view of the elements of the lexicon. The doubt came about because I was thinking about this question: what is the data-type of a long-term memory object?

      To take one specific view, particularly roots in DM. They are seen to be structure-free or category-free. But, treelets presumably have structural or category information. So, what exactly does it mean to say that a repository (LTM) can have both completely structure-free items and those with structure? They are fundamentally different types of objects.

      To go on a detour a bit with numbers: It is a bit like saying an array can have both integers and real numbers. It is true, the value of an integer can be stored as an equivalent real number, but then it is not an integer anymore.

      I wonder, similarly, if by saying we can store roots and treelets in the same repository, we are making commitments to the data-type, or representational format of the members of the repository. In which case, saying that you can have a repository of both roots and treelets might mean that you are committing to a view wherein roots have some abstract unfulfilled (unvalued?) structure. Maybe, that is what (some types of) features are somehow.

      Not that that is bad/wrong, but I am just trying to point out, that the subset view of the lexicon might not come for free, and it may involve other commitments.

      Perhaps, if the concern I raise is legitimate, one could also say that “no, roots really are structure-free in a meaningful sense, and we also really do store treelets”. In which case, I think to keep a consistent data-type, one might be forced to accept two separate repositories, one for roots, and the other for treelets. But, this is not really the subset view of the lexicon anymore.

    9. @Karthik: You're right that the view I was sketching assumed a somewhat traditional lexicon, in that it is (1) non-"distributed" (in the DM sense), and (2) the items in the lexicon are of the same type as linguistically-constructed objects.

      How much of a departure from this would DM entail? I'm not sure. If structureless, category-less roots of the DM kind are cognitively real, then by definition long-term memory is capable of storing them. The subset question you raise then boils down to whether there is a supertype that both roots and linguistically-constructed objects belong to (in which case the answer to the question is still yes), or there isn't (in which case the answer is no). I must confess that I don't know how to begin addressing that question.

      DM comes with other commitments, too. Apart from the list of roots, there are two more "lists" (Vocabulary, Encyclopedia), each of which – it seems – has its own unique data type. But again, whether these can be seen as mere subsets of general long-term linguistic storage depends on how all of these entities are typed. You could imagine something akin to HPSG where "rules" and "constituents" and "constraints" are all objects that belong to a giant type hierarchy that is able to unify what initially seem to be disparate kinds of data.