Faculty of Language: Brains and syntax : part 1

Friday, September 2, 2016

Brains and syntax : part 1

William Matchin has been a post doc at UMD working with Ellen Lau. He is interested in how brains use grammars in real time. He has written a paper describing the frustrations that a neuroscientist has when approaching syntacticians for help. He has also provided some positive suggestions for how to move forward. I present his thoughts in two parts. First the problem as he sees it and, sometime over the weekend, I will post his suggested solutions. It's interesting stuff.

***

Linking syntactic theory to behavior and brains

It is very frustrating to work in the cognitive neuroscience of syntax within the mainstream Generative Grammar (GG) framework when there are essentially no real linking theories on offer between syntactic theory and online sentence processing. At present, the connection between syntactic theory and whatever people do when they hear and produce sentences is completely opaque, as is how the mature language system develops during acquisition. My present point is to underscore the essentiality of establishing such a linking theory. I truly believe that any cognitive neuroscience of language that seeks to incorporate the insights of Generative Grammar absolutely needs such a linking theory. Of course, cognitive neuroscience of language can proceed without incorporating the insights of syntactic theory, and this is often done – most people working on syntax attempt to localize some vague, a-theoretical notion of “syntactic processing” without clearly defining what this is. An even clearer example of departure from syntactic theory is recent work by that posits certain brain areas that are “core” language areas, without defining what language is beyond “you know it when you see it” (Fedorenko et al., 2011). Is that what we want for neuroscience investigations of language – near total disregard for GG? I don’t. The whole reason I am (was) here at UMD is because I find syntactic theory very deep, both for descriptive and explanatory adequacy, and I in fact think that the Minimalist program in particular may allow bridging the gap between linguistics and the fields of neuroscience and evolutionary biology.

There are reasons for this disregard, a major one being that nobody talks about how a Minimalist grammar is used. We certainly have plenty of insightful work in acquisition and psycholinguistics that tell us when children know certain grammatical constructions (e.g., Lukyanenko et al., 2014) or when certain grammatical constraints are used online (e.g., Phillips 2006), but we don’t have any strongly plausible suggestions as to what happens mechanistically. For example, it seems that people don’t search for gaps inside of islands, but why don’t they? How is the grammatical knowledge deployed in real time such that people don’t try to find a gap inside an island? This issue is a fundamental question for my line of work, and one that remains unanswered. For a related example in the world of brains, there is a very close connection between the syntactic properties of sentences and activation in language-relevant brain areas (e.g., Pallier et al., 2011; Matchin et al., 2014) – but what does this mean with respect to the function of these brain areas? Are these areas “Merge areas”, or something else? If something else, what is our theory of this something else (that takes into account the fact that this area cares about structure)? This sort of question applies to pretty much every finding of this sort in psycholinguistics, language acquisition, and cognitive neuroscience.

My work rests on experiments of language use, in normal people during brain scanning or in patients with brain damage. I attempt to explain why brain areas light up the way that they do when people are producing or comprehending language, or why patients have particular problems with language after damage to certain brain areas, and I try to connect these notions with syntactic theory. However, it is very hard to proceed without knowing how the postulates of syntactic theory relate to behavior. Here are just a sample of major questions in this regard that exemplify the opacity between syntactic theory and online processing:

· When processing a sentence, do I expect Merge to be active? Or not?

· What happens when people process things less than full sentences (like a little NP – “the dog”)? What is our theory of such situations?

· Do derivations really proceed from the bottom up, or can they satisfactorily be switched to go top-down/left-right using something like Merge right (Phillips 1996)?

· What happens mechanistically when people have to revise structure (e.g., after garden-pathing)?

· Are there only lexical items and Merge? Or are there stored complex objects, like “treelets”, constructions, or phrase structure rules?

· How does the syntactic system interact with working memory, a system that is critical for online sentence processing?

These things are not mentioned in syntactic theory because of the traditional performance/competence separation (Chomsky, 1965). There did use to be some discussion of these linking issues in work that sought to bridge the gap between syntactic theory and online sentence processing (e.g., Miller & Chomsky, 1963; Fodor et al., 1974; Berwick & Weinberg, 1983), but it does not seem so for currently, at least for Minimalism. In order for me to do anything at all reasonable in neuroscience with respect to syntax, I need to have at least a sketch of a theory that provides answers to these questions, and such a theory does not exist.

There are syntactic theories on the market that do connect (somewhat) more transparently to behavior than mainstream GG – those of the “lexicalist” variety (e.g., Bresnan, 2001; Vosse & Kempen, 2000; Frank, 2002; Joshi et al., 1975, Lewis & Vasishth, 2005), with the general virtues of this class of theory, including the very virtues of transparency to online behavior, summarized by Jackendoff (2002) and Culicover and Jackendoff (2005; 2006). In my mind, Jackendoff and Culicover are right on the point of transparency – this kind of grammatical theory does connect much better with what we know about behavior and aphasia. At the very least, it seems to me impossible to even get of the ground in discussions of psycholinguistics, neuroimaging or aphasia without postulating some kind of stored complex structures, “constructions” or “treelets”, or perhaps old-fashioned phrase structure rules that might fill an equivalent role to treelets (see Shota Momma’s 2016 doctoral dissertation for an excellent review of this evidence for psycholinguistics, hopefully available soon J). Minimalist grammars do not provide this level of representation, while lexicalist theories do.

Here is a set of fundamental observations or challenges from psycholinguistics and neurolinguistics that any kind of linking theory between syntax and online sentence processing should take into account:

· Online processing is highly predictive and attempts to build dependencies actively (Omaki et al., 2015; Stowe, 1986 – filled-gap effect)

· Online processing very tightly respects grammatical properties (e.g., phrase structure, Binding principles, Island structures) (Lewis & Phillips, 2015)

· Stored structures of some kind seem necessary to capture many behavioral phenomena (Momma, 2016; see also Demberg & Keller, 2008)

· The memory system involved in language appears to operate along the lines of a parallel, content-addressable memory system with an extremely limited focus of attention (Lewis et al., 2006)

· The main brain “language areas” are highly sensitive to hierarchical structure and other grammatical properties (Embick et al., 2000; Musso et al., 2003; Pallier et al., 2011)

· Damage restricted to Broca’s area (the main language-related brain region) results in only minor language impairments, not fundamental issues with language (Linebarger et al., 1983; Mohr et al., 1978)

· Contra Grodzinsky (2000), there doesn’t seem to be any class of patients that is selectively impaired in a particular component of grammar (e.g., Wilson & Saygin, 2004), implying that core grammatical properties are not organized in a “syntacto-topic” fashion in the cortex

· The neuroimaging profile of “language areas” indicates that while these areas are sensitive to grammatical properties, their functions are not tied to particular grammatical operations but rather with the processing ramifications of them (Rogalsky & Hickok, 2011; Stowe et al., 2005; Matchin et al., 2014; Santi & Grodzinsky, 2012; Santi et al., 2015)

As I explain in more detail later in this post, a language faculty that makes prominent use of stored linguistic structures and a memory retrieval system operating over them allows us to make coherent sense out of these kinds of findings.

At any rate, it seems painfully true to me that the ‘syntacto-topic conjecture’ (Grodzinsky, 2006; Grodzinsky & Friederici, 2006), the attempt to neurally localize the modules of syntactic theories of GG (e.g., Move alpha, Binding principles, fundamental syntactic operations, etc.), has completely failed. Let me underscore that – there are no big chunks of dedicated “grammatical” cortex to be found in the brain, if grammatical is to be defined in these sorts of categories. At any rate, did we want to abandon the Minimalist program and re-adopt Government and Binding (GB) theory in order to localize its syntactic modules? One of the virtues of the Minimalist program, in my view, is that it seeks a more fundamental explanation for the theoretical postulates of GB (Norbert’s numerous blog discussions of this issue), which we didn’t want as the primitive foundations of language for reasons such as Darwin’s problem – the problem of how language emerged in the species during evolution. Incidentally, the lack of correspondence between GB and the brain is another reason to pursue something like the Minimalist program, which possesses a much slimmer grammatical processing profile that wouldn’t necessarily take up a huge swath of cortex. Positing a rich lexicon with a slim syntactic operation seems to me to be a very plausible way to connect up with what we know about the brain.

Except for the fact that I know of no linking theory between grammar and behavior for a Minimalist grammar aside from Phillips (1996). And even that linking theory really only addresses one issue listed above, the issue of derivational order – it did not answer a whole host of questions concerning the system writ large. Namely, it did not provide what I believe to be the critical level of representation for online sentence processing – stored structures. So I have no way of explaining the results of neuroimaging and neuropsychology experiments in Minimalist terms, meaning that the only options are: (1) adopt a lexicalist grammatical theory a la Culicover & Jackendoff (2005; 2006) and eschew many of the insights of modern generative grammar (2) develop a satisfactory linking theory for Minimalism (which I argue should incorporate stored structures, etc.).

It may be the case that syntacticians don’t care, because these concerns are not relevant to providing a theory of language as they’ve defined it. And I actually agree with this point – I don’t necessarily think that syntacticians ought to change the way they do business to accommodate these concerns. Here I strongly disagree with Jackendoff – I think there are good reasons to maintain the competence/performance distinction in pursuing a good description of the knowledge of language, because it allows them to focus and develop theories, whereas I think if they were to start incorporating all of this then they’d be paralyzed, because there is just too much going on in the world of behavior (modulo experimental syntax approaches, e.g. Sprouse, 2015, that are well-targeted within the domains of syntactic theory). I do recall a talk by Chomsky at CUNY 2012 where he pretty much said “hey guys stop looking at behavior there’s too much going on and you’ll get confused and go nowhere” – it doesn’t seem like patently bad advice.

However, I don’t think our studies of behavior and the brain have done nothing, and maybe it’s the right time for syntacticians, psycholinguists, and neuroscientists to start connecting everything up together.

There are also some very important reasons to care. If you want your theories to be taken seriously by psycholinguists and neurolinguists, you need to give them plausible ways in which your theoretical postulates can be used during online processing. The days of Friederici and Grodzinsky are numbered – Grodzinsky has already pretty much renounced the syntacto-topic conjecture (finally – Santi et al., 2015), and Friederici clings to what I think is a very hopeless position regarding Broca’s area and Merge (Zaccarella & Friederici, 2015). These were the only people that seriously engage with syntactic theory in generative grammar who have any clout in cognitive neuroscience. Everyone else seems to be pretty much ignoring mainstream Generative Grammar. Is that what we want? I can imagine that this sort of stuff is important for intra- and inter-departmental collaboration, funding, etc.

There seems to be a decline in the purchase of generative grammar in the scientific community, which may only hasten with time and the eventual death of Chomsky. A good way to forestall or reverse this is by opening up a channel of communication with psychologists and neuroscientists through these specific linking theories (at least a sketch of one), not merely the promise of some possible linking theories (which appears to be what Norbert is telling me in our conversations). We need to actually make at least a rough sketch of a real linking theory in order to get this enterprise off the ground.

Secondly, it might be the case that introducing a plausible linking theory has ramifications for how you think about language and syntactic theory. There could be some very useful insights into syntactic theory to be gained once greater channels of communication are open and running.

I am worried that I cannot successfully combine my work in neuroscience with syntactic theory and generative grammar. The conference that I attend every year, the “Society for the Neurobiology of Language”, ought to be the most sympathetic place you’d find for GG to talk to neurobiology. In reality, there is hardly ever a peep about GG at these conferences. This ought to be a very disturbing state of affairs for you – it certainly is to me. The sometimes latent, sometimes explicit message that I keep receiving from my field is to stop caring about GG because it bears no relation to what we do. I reject this message, but in order to do meaningful work, I need to be armed with a good (sketch of a) linking theory. A big goal of this post is to solicit reactions and suggestions from syntacticians in developing this theory.

42 comments:

OmerSeptember 2, 2016 at 2:21 PM
I think a large portion of these problems stems from people conflating listedness and storedness. When generative syntacticians talk about the lexicon, they are talking about what needs to be listed (i.e., which units have properties that cannot be computed on the basis of the properties of their parts). I don't think most psychologists who (think they) are working on linguistic capacities use the term that way, though; they simply use it to refer to long-term memory of language-related things. Bizarrely (since they both should know better), Jackendoff and Culicover are among those caught in this confusion.

To see how deep this goes, the whole bottom-up structure building issue might very well be a red herring if we view it through the listedness vs. storedness prism. It may very well be that language production as well as processing deploy precompiled(=stored) units in a left-to-right/top-to-bottom fashion – and meanwhile, those units are compiled in the first place (from listed atoms) only if they fit correctly into the bottom-up schema that "competence" people have developed. (A useful if imperfect analogy is my brain storing the output of long multiplication for pairs of operands that I have multiplied often enough, and only if that <operand1, operand2, product> triad respects the "competence" rules for multiplication in the first place.)

This way of looking at things admittedly drives a deep, sharp wedge between competence and performance, and thus between theories of grammar and theories of grammar's use. But just because something is unfortunate – scientifically, and maybe even from a politics/funding perspective – doesn't mean it's not the truth.
ReplyDelete
Replies
William MatchinSeptember 3, 2016 at 7:06 AM
A big oversight of mine in this post was that Townsend & Bever (2001) do propose an explicit linking theory between syntax and sentence comprehension. They posit that we heuristically understand a sentence, and then go back through it a second time with an explicit syntactic derivation to confirm the interpretation (an analysis-by-synthesis approach). Their slogan is "we understand everything twice". I think Phillips (2012) clearly points out the flaws in this proposal (namely, that we don't understand everything twice), but Townsend & Bever had the virtue of making an explicit linking theory.
ReplyDelete
Replies
Colin PhillipsSeptember 4, 2016 at 6:16 PM
The post refers to the imminent availability of Shota Momma's (very interesting) 2016 PhD thesis. It's already available, as it happens:
http://www.colinphillips.net/?page_id=53
ReplyDelete
Replies
Colin PhillipsSeptember 4, 2016 at 7:05 PM
William: I'm unsure of why you're seeing a fundamental challenge here.

"Representation" and "memory" are basically names for the same thing. Similarly, operations that merge or otherwise connect two smaller representations entail memory access operations. We just tend to use different terms when operating at different levels of analysis.

Everybody is committed to having stuff stored in long-term memory. For some, the pieces are bigger, for others the pieces are smaller, but you've gotta have something. You seem to assume that the stored stuff comes in big pieces, and that this creates a challenge for linking theories, but I don't see the evidence or the challenge as so clear cut.

As for how we make use of grammatical knowledge in real time, we know way more about this than we did 20 years ago. And the evidence is pretty consistent: it's hard to find good evidence of delay in access to any grammatical knowledge. (For sure, sometimes things don't work out perfectly, but that's another story.)

The key feature that distinguishes comprehension and production tasks from what most traditional grammatical work is uncertainty. Comprehension is the task of constructing a sound-meaning pairing with only the sound provided as guidance. Production is the task of constructing a sound-meaning pairing with only the meaning provided as guidance. The uncertainty inherent in these tasks brings challenges that are less of a concern for John or Jane Grammarian. But again I see no conflict.

I don't see clear evidence that such-and-such grammatical theory is better suited to capturing real-time processes. Whenever we have dug into such claims, we've generally found them to be not so persuasive.

I think that a lot of misunderstanding arises due to conflation of three different distinctions: (i) levels of analysis, (ii) tasks, and (iii) components of a cognitive architecture. Discussions of competence and performance almost always exacerbate things, because the terms mean different things in different contexts. And there is certainly plenty of mistrust between subfields. But from where I sit, I don't see a fundamental challenge. And among the growing community of researchers who are equally at home with linguistic flora and fauna and the ins-and-outs of parsing/production and memory models, I think the feeling is similar. There are lots of interesting things to work on, but we don't see a fundamental challenge.
ReplyDelete
Replies
William MatchinSeptember 4, 2016 at 7:32 PM
This comment has been removed by the author.
ReplyDelete
Replies
Greg KobeleSeptember 5, 2016 at 6:39 AM
You might be interested in the work of John Hale, who has formulated explicit linking theories between minimalist grammars and behaviour (suprisal; entropy reduction). He has been working recently to relate this to fMRI data.

I have also proposed a memory based linking theory (adapted from Joshi's TAG based one and related to Gibson's DLT), the logical space surrounding which has been rigorously explored by Thomas Graf.

This is all predicated on the happy fact that there exist correct parsing algorithms for minimalist grammars, as first set out by Henk Harkema in his UCLA PhD thesis, where he shows that bottom-up and (predictive) top-down can be formulated in this context.

Recently Stabler has been exploring the merits of a probabilistic version of Henk's Top Down parser.

Makoto Kanazawa has shown that a very general construction gives predictive Earley parsers for essentially all restrictive grammar formalisms.
ReplyDelete
Replies
UnknownSeptember 5, 2016 at 7:14 AM
William - thanks for the provocative thoughts!

I totally agree with the challenge to specifying linking hypotheses, and also with the general sense of "the room" as to the importance of chunking. I think things might be less bleak than it looks at first glance(??) We (me, John Hale, Christophe Pallier etc.) just began an NSF-funded project to push forward on the size (and kind) of these chunks (#1607251, #1607441). We are looking both at some of the standard NLP approaches vis a vis establishing a cutoff for word-sequences and also at the view whats get chunked are parser operations (following Newell et al. on SOAR; applied to parsing in Lewis' 1993 dissertation and Hale's 2014 book). We test a few different grammars for defining these chunks (MG, TAG etc.) in the same project. Upshot: I think there are some promising linking hypotheses in the computational literature, but they haven't been applied to neuro data yet!

I disagree that there is "no linking theory between grammar and behavior for a Minimalist grammar aside from Phillips (1996)." Stabler, Hale, Kobele, Hunter and others have been pushing forward on family of automata for parsing MGs. These can be as predictive as your sentence-processing theory demands (i.e. the span the GLC lattice). The bigger point goes back to Stabler's 1991 "Avoid the Pedestrian's Paradox" paper. That paper really nicely demonstrates that the "directionality" of the grammar and the directionality and/or predictiveness of the parser are totally orthogonal. We take a stab at applying these tools to query fMRI data in a recent paper (doi: 10.1016/j.bandl.2016.04.008)

Finally - can you embed links in these comments??
ReplyDelete
Replies
UnknownSeptember 5, 2016 at 3:01 PM
@William: I may have missed this, but what do you see as the main processing evidence for the necessity of having treelets in the grammar? I know you linked to Demberg & Keller and Momma, but perhaps you could summarize what you have in mind?
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Friday, September 2, 2016

Brains and syntax : part 1

42 comments:

Contributors