Saturday, September 3, 2016

Brains and syntax: part 2

This is the second part of William Matchin's paper. Thanks again to William for putting this down on paper and provoking discussion.

One (very rough) sketch of a possible linking theory between a minimalist grammar and online sentence processing

I am going to try and sketch out what I think is a somewhat reasonable picture of the language faculty given the insights of syntactic theory, psycholinguistics, and cognitive neuroscience. My sketch here takes some inspiration from TAG-based psycholinguistic research (e.g., Demberg & Keller, 2008) and the TAG-based syntactic theory developed by Frank (2002) (thanks to Nick Huang for drawing this work to my attention).

Figure from Frank (2002). The dissociation between the inputs and operations of basic structure building and online processing/manipulation of treelets is clearly exemplified in the grammatical framework of Frank (2002).

The essential qualities of this picture of the language faculty are as follows. Minimalism is essentially a theory of the objects of language, the syntactic representations that people have. These objects are TAG treelets. TAG is a theory of what people do with these objects during sentence processing. TAG-type operations (e.g., unification, substitution, adjunction, verification) may be somehow identifiable with memory retrieval operations, opening up a potentially general cognitive basis for the online processing component of the language faculty, leaving the language-specific component to Merge. This proposal severs any inherent connection between Merge and online processing – although nothing in the proposal precludes the online implementation of Merge during sentence processing, much of sentence processing might proceed without having to implement Merge, but rather TAG operations operating over stored treelets.

I start with what I take to be the essential components of a Minimalist grammar – the lexicon and the computational system (i.e., Merge). Things work essentially as a Minimalist grammar says – you have some lexical atoms, Merge combines these elements (bottom-up) to build structures that are interpreted by the semantic and phonological systems, and there are some principles – some of them part of cognitive endowment, some of them “third factors” or general laws of nature or computation – that constrain the system (Chomsky, 1995; 2005).

The key difference that I propose is that complex derived structures can be stored in long-term memory. Currently, Minimalism states that the core feature of language, recursion, is the ability to treat derived objects as atoms. In other words, structures are treated as words, and as such are equally good inputs to Merge. However, the theory attributes the property of long-term storage only to atoms, and denies long-term storage to structures. Why not make structures fully equivalent to the atoms in their properties, including both Merge-ability AND long-term store-ability?

These stored structures or treelets can either be fully-elaborated structures with the leaves attached, or they might be more abstract nodes, allowing different lexical items to be inserted. It seems important from the psycholinguistic literature to have abstract structural nodes (e.g. NP, VP), so this theory would have to provide some means of taking a complex structure created by Merge and modifying it appropriately to eliminate the leaves (and perhaps many of the structural nodes) of the structure through some kind of deletion operation.

Treelets are the point of interaction between the syntactic system (essentially a Minimalist grammar) and the memory system. It may be the top-down activation of memory retrieval operations that “save” structures as treelets. Memory operations do much of the work of sentence processing – retrieving structures and unifying/substituting them appropriately to efficiently parse sentences (see Demberg & Keller, 2008 for an illustration). Much of language acquisition amounts to refining the attention/retrieval operations as well as the set of treelets and the prominence/availability of such treelets) that the person has available to them.

I think that there are good reasons to think that the retrieval mechanisms and the stored structures/lexical items live in language cortex. Namely, retrieval operations live in the pars triangularis of Broca’s area and stored structures/lexical items live in posterior temporal lobe (somewhere around the superior temporal sulcus/middle temporal gyrus).

This approach pretty much combines the Minimalist generative grammar and the lexicalist/TAG approaches. Note also that retrieving a stored treelet includes the fact that the treelet was created through applications of Merge. So when you look at structure that is finally said by a person, it is both true that the syntactic derivation of this structure is generated bottom-up in accordance with the operations and principles of a minimalist grammar, AND that the person used the thing by retrieving a stored treelet. We can (hopefully) preserve both insights – bottom-up derivation with stored treelets that can be targeted by working memory operations.

One remaining issue is how treelets are combined and lexical items inserted into them – this could be a substitution or unification operation from TAG, but Merge itself might also work for some cases (suggesting some role for Merge in actual online processing).

I think this proposal starts to provide potential insights into language acquisition. Say you’re a person walking around with this kind of system – you’ll want to start directing your attentional/working memory system to all these objects being generated by Merge and creating thoughts. You’ll also (implicitly) realize that other people are saying stuff that connects to your own system of thought, and you’ll start to align your set of stored structures and retrieval operations to match the patterns of what you’re seeing in the external world. This process is language acquisition, and it creates a convergence on the set of features, stored structures, and retrieval operations that are used within a language.

This addresses some of the central questions I posited earlier:

When processing a sentence, do I expect Merge to be active? Or not?

- Not necessarily, maybe minimally or not at all for most sentences.

What happens when people process things less than full sentences (like a little NP – “the dog”)? What is our theory of such situations?

- A little treelet corresponding to that sub-sentence structure is retrieved and interpreted.

Do derivations really proceed from the bottom up, or can they satisfactorily be switched to go top-down/left-right using something like Merge right (Phillips 1996)?

- Syntactic derivations are bottom-up in terms of Merge, but sentence processing occurs left-to-right roughly along the lines of TAG-based parsing frameworks (Demberg & Keller, 2008).

What happens mechanistically when people have to revise structure (e.g., after garden-pathing)?

- De-activate the current structure, retrieve new treelets/lexical items that fit better with what was presented. Lots of activity associated with processing lexical items/structures and memory retrievals, but there may not be an actual activation/implementation of Merge.

Are there only minimal lexical elements and Merge? Or are there stored complex objects, like “treelets”, constructions or phrase-structure rules?

- Yes, there are treelets, but we have an explanation for why there are treelets – they were created through applications of Merge at some point in the person’s life, but not necessarily online during sentence processing.

How does the syntactic system interact with working memory, a system that is critical for online sentence processing?

- The point of interaction between syntax and memory is the treelet. Somehow certain features encoded on treelets have to be available to the memory system.

Now that I have these answers, I can proceed to do my neuroimaging and neuropsychology experiments with testable predictions regarding how language is effected in the brain:

What’s the function of Broca’s area?

- Retrieval operations that are specialized to operate over syntactic representations.
- Which is why when you destroy Broca’s area you are still left with a bunch of treelets that can be activated in comprehension/production that you can use pretty effectively, although you have less strategic control over them.
- We expect patients with damage to Broca’s area to be able to basically comprehend sentences, but really have trouble in cases requiring recovery/revision, long-distance dependencies, prediction, and perhaps second language acquisition

What’s the function of posterior temporal areas?

- Lexical storage, including treelets.
- We expect activation for basic sentence processing, more activation for ambiguity/garden-path sentences when more structural templates are activated.
- We expect patients with damage to posterior temporal damage to have some real problems with sentence comprehension/production).

Where are fundamental structure building operations in the brain, e.g. Merge?

- Merge is a subtle neurobiological property of some kind.
- It might be in the connections between cortical areas, perhaps involving subcortical structures, or some property of individual neurons, but regardless, there isn’t a “syntax area” to be found.

What are the ramifications of this proposal for the standard contrast of sentences > lists that is commonly used to probe sentence processing in the brain?

- This contrast will highlight all sorts of things, likely including the activation of treelets, memory retrieval operations, semantic processing, but it might not be expected to drive activation for basic syntactic operations, i.e. Merge

Here I have tried to preserve Merge as the defining and simple feature of language – it’s the thing that allows people to grow structures. It also clearly separates Merge from the issue of “what happens during sentence processing”, and really highlights the core of language as something not directly tied to communication. Essentially, the theory of syntax becomes the theory of structures and dependencies, not producing and understanding sentences. On this conception of language, there is this Merge machinery creating structures, perhaps new in evolution that can be harnessed by an (evolutionarily older) attentional/memory system for the purposes of producing and comprehending sentences through storing treelets in long term memory. Merge is clearly separate from this communication/memory system, and an engine of thought. Learning a language then becomes a matter of refining the retrieval operations and what kinds of stored treelets you have that are optimized for communicating with others over time.

If this is a reasonable picture of the language faculty, thinking along these lines might start to help resolve some conundrums in the traditional domain of syntax. For example, there is often the intuition that syntactic islands are somehow related to processing difficulty (Kluender & Kutas 1993; Berwick & Weinberg, 1984), but there is good evidence that islands cannot be reduced to online processing difficulty or memory resource demands (Phillips, 2006; Sprouse et al., 2012). One approach might be to attribute islands to a processing constraint that somehow becomes grammaticalized (Berwick & Weinberg, 1984). The present framework provides a way for thinking about this issue, because the interaction between syntax and the online processing/memory system is specified. I have some more specific thoughts on this issue that might take the form of a future post.

At any rate, I would love any feedback on this type of proposal. Do we think this is a sensible idea of what the language faculty looks like? What are some serious objections to this kind of proposal? If this is on the right track, then I think we can start to make some more serious hypotheses about how language is implemented in the human brain beyond Broca’s area = Merge.

Many thanks to Nick Huang (particularly for pointing out relevant pieces of literature), Marta Ruda, Shota Momma, Gesoel Mendes, and of course Norbert Hornstein for reading this and giving me their thoughts. Thanks to Ellen Lau, Alexander Williams, Colin Phillips and Jeff Lidz for helpful discussion on these topics. Any failings are mine, not theirs.


Berwick, R. C., & Weinberg, A. S. (1983). The role of grammars in models of language use. Cognition, 13(1), 1-61.

Berwick, R., and Weinberg, A.S. (1984). The grammatical basis of linguistic performance. Cambridge, MA: MIT Press.

Bresnan, J. (2001). Lexical-Functional Syntax Blackwell.

Chomsky, N. (2005). Three factors in language design. Linguistic inquiry, 36(1), 1-22.

Chomsky, N. (1965). Aspects of the Theory of Syntax. MIT press.

Culicover, P. W., & Jackendoff, R. (2005). Simpler syntax. Oxford University Press on Demand.

Culicover, P. W., & Jackendoff, R. (2006). The simpler syntax hypothesis. Trends in cognitive sciences, 10(9), 413-418.

Demberg, V., & Keller, F. (2008, June). A psycholinguistically motivated version of TAG. In Proceedings of the 9th International Workshop on Tree Adjoining Grammars and Related Formalisms. Tübingen (pp. 25-32).

Embick, D., Marantz, A., Miyashita, Y., O'Neil, W., & Sakai, K. L. (2000). A syntactic specialization for Broca's area. Proceedings of the National Academy of Sciences, 97(11), 6150-6154.

Fedorenko, E., Behr, M. K., & Kanwisher, N. (2011). Functional specificity for high-level linguistic processing in the human brain. Proceedings of the National Academy of Sciences, 108(39), 16428-16433.

Fodor, J., Bever, A., & Garrett, T. G. (1974). The psychology of language: An introduction to psycholinguistics and generative grammar.

Frank, R. 2002. Phrase Structure Composition and Syntactic Dependencies. Cambridge, Mass: MIT Press.

Grodzinsky, Y. (2000). The neurology of syntax: Language use without Broca's area. Behavioral and brain sciences, 23(01), 1-21.

Grodzinsky, Y., & Friederici, A. D. (2006). Neuroimaging of syntax and syntactic processing. Current opinion in neurobiology, 16(2), 240-246.

Grodzinsky, Y. (2006). A blueprint for a brain map of syntax. Broca’s region, 83-107.

Jackendoff, R. (2003). Précis of foundations of language: brain, meaning, grammar, evolution. Behavioral and Brain Sciences, 26(06), 651-665.

Joshi, A. K., & Schabes, Y. (1997). Tree-adjoining grammars. In Handbook of formal languages (pp. 69-123). Springer Berlin Heidelberg.

Kluender, R., & Kutas, M. (1993). Subjacency as a processing phenomenon. Language and cognitive processes, 8(4), 573-633.

Lewis, S., & Phillips, C. (2015). Aligning grammatical theories and language processing models. Journal of Psycholinguistic Research, 44(1), 27-46.

Lewis, R. L., & Vasishth, S. (2005). An activationbased model of sentence processing as skilled memory retrieval. Cognitive science, 29(3), 375-419.

Lewis, R. L., Vasishth, S., & Van Dyke, J. A. (2006). Computational principles of working memory in sentence comprehension. Trends in cognitive sciences, 10(10), 447-454.

Linebarger, M. C., Schwartz, M. F., & Saffran, E. M. (1983). Sensitivity to grammatical structure in so-called agrammatic aphasics. Cognition, 13(3), 361-392.

Lukyanenko, C., Conroy, A., & Lidz, J. (2014). Is she patting Katie? Constraints on pronominal reference in 30-month-olds. Language Learning and Development, 10(4), 328-344.

Matchin, W., Sprouse, J., & Hickok, G. (2014). A structural distance effect for backward anaphora in Broca’s area: An fMRI study. Brain and language, 138, 1-11.

Miller, G. A., & Chomsky, N. (1963). Finitary models of language users.

Mohr, J. P., Pessin, M. S., Finkelstein, S., Funkenstein, H. H., Duncan, G. W., & Davis, K. R. (1978). Broca aphasia Pathologic and clinical. Neurology, 28(4), 311-311.

Momma, 2016 (doctoral dissertation, University of Maryland, department of Linguistics)

Musso, M., Moro, A., Glauche, V., Rijntjes, M., Reichenbach, J., Büchel, C., & Weiller, C. (2003). Broca's area and the language instinct. Nature neuroscience, 6(7), 774-781.

Omaki, A., Lau, E. F., Davidson White, I., Dakan, M. L., Apple, A., & Phillips, C. (2015). Hyper-active gap filling. Frontiers in psychology, 6, 384.

Pallier, C., Devauchelle, A. D., & Dehaene, S. (2011). Cortical representation of the constituent structure of sentences. Proceedings of the National Academy of Sciences, 108(6), 2522-2527.

Phillips, C. (1996). Order and structure (Doctoral dissertation, Massachusetts Institute of Technology).

Phillips, C. (2006). The real-time status of island phenomena. Language, 795-823.

Rogalsky, C., & Hickok, G. (2011). The role of Broca's area in sentence comprehension. Journal of Cognitive Neuroscience, 23(7), 1664-1680.

Santi, A., & Grodzinsky, Y. (2012). Broca's area and sentence comprehension: A relationship parasitic on dependency, displacement or predictability?. Neuropsychologia, 50(5), 821-832.

Santi, A., Friederici, A. D., Makuuchi, M., & Grodzinsky, Y. (2015). An fMRI Study Dissociating Distance Measures Computed by Broca’s Area in Movement Processing: Clause boundary vs Identity. Frontiers in psychology, 6, 654.

Sprouse, J. (2015). Three open questions in experimental syntax. Linguistics Vanguard, 1(1), 89-100.

Sprouse, J., Wagers, M., & Phillips, C. (2012). A test of the relation between working-memory capacity and syntactic island effects. Language, 88(1), 82-123.

Stowe, L. A., Haverkort, M., & Zwarts, F. (2005). Rethinking the neurological basis of language. Lingua, 115(7), 997-1042.

Stowe, L. A. (1986). Parsing WH-constructions: Evidence for on-line gap location. Language and cognitive processes, 1(3), 227-245.

Vosse, T., & Kempen, G. (2000). Syntactic structure assembly in human parsing: a computational model based on competitive inhibition and a lexicalist grammar. Cognition, 75(2), 105-143.

Wilson, S. M., & Saygın, A. P. (2004). Grammaticality judgment in aphasia: Deficits are not specific to syntactic structures, aphasic syndromes, or lesion sites. Journal of Cognitive Neuroscience, 16(2), 238-252.

Zaccarella, E., & Friederici, A. D. (2015). Merge in the human brain: A sub-region based functional investigation in the left pars opercularis. Frontiers in psychology, 6.


  1. You maybe want to check out B Srinivas's work on supertagging

    1. Like the stuff here?:

  2. This picture of the world is extremely appealing to me. I always felt like Frank's (2002) Minimalism/TAG duality didn't get the attention it deserves, and using this duality to represent competence vs. performance seems, at the very least, an interesting path to go down.

  3. Empirical grammatical evidence for the kind of stored structures you suggest comes from phrasal idioms (see, e.g., Marantz 1997); it strikes me that exploring the neurological signature of idioms versus syntactically equivalent compositional structures might be an interesting way of trying to confirm your hypothesis.

    Also, I would be remiss as a Berkeleyite if I didn't point out that Construction Grammarians have spent the last 50 years arguing that stored structures associated with stored meanings exist. Of course, straight CG does not explain why these stored structures still follow normal syntactic constraints.

    1. Thanks for the input. Under the current proposal, though, most sentences are like idioms but lack a stored semantic interpretation; the syntax is stored for both idioms and most regular sentences (notwithstanding whatever substitution still needs to occur), but the semantics of most normal sentences is compositional.

      On your second point, this fact would be explained by the idea that every single stored structure is a syntactic object created by the same combinatorial operations with the same constraints.

    2. I 100% agree with that post. It is important to make the distinction between the primitive units of the lexicon and whatever representations are stored in long-term memory.

    3. Thanks to both for the helpful posts.

  4. I think it's easy to overestimate the degree to which TAG is more "treelet-based" than (say) minimalist syntax. As soon as you move to having things like subcategorization frames be part of a lexical item, then for many purposes you've got something equivalent to a TAG elementary tree. So for example if the verb 'give' comes with a subcategorization frame that says '[__ NP PP]' (and we understand this to mean that if you combine it with an NP and a PP then you've get a VP which has those three constituents as daughters), then it's not that different from writing down a tree which has VP at its root, a V 'give' as one daughter, and daughter-less NP and PP nodes as its other two daughters. In a minimalist context we tend to think of the tree as only appearing once the required merge operations have taken place -- but since the presence of the subcategorization features means that those merge operations are required, i.e. there's no way to use 'give' in a sentence without those merge operations happening, this is almost just a difference in notation. (The merge operations "have to be part of the derivation" as soon as you decide that 'give' is a part of it.)

    There are of course differences between TAGs and minimalism in general -- roughly, it's the difference between adding movement to the basic system of merge/substitution described above, or adding TAG-adjunction to it. And since TAG-adjunction is inherently a tree-based operation, it is natural for the system to work with "trees all the way down" (i.e. I guess you can do adjunctions before any substitutions). But the fact remains, I think, that a minimalist-style lexical item adorned with some collection of features dictating that this lexical item in effect "brings with it" certain combinatorial operations, is a "chunk" of approximately the same size as a TAG elementary tree.

    The other fly in the ointment here is perhaps the fact that work in TAG (including Bob Frank's) often takes elementary tree size to be bounded by the notion of an *extended projection*, whereas when minimalists write down encodings of the stuff that needs to accompany a head only within its (maximal) projection itself. So it might look like the treelet associated with 'give' goes up to the TP node in a TAG, whereas the treelet that is in effect associated with 'give' in minimalism only goes up to VP. But one still needs to state somewhere that, for example, Ts take VPs as their complements -- it's just that the TAG folks have bitten the bullet and put this information in the same place as all subcategorization information, whereas minimalists (if my impression is correct) are still umm-ing and err-ing a little bit on how to encode extended projections. So again I think appearances are deceiving.

    Lastly, it seems relevant to just mention that Bob Frank's motivation for mixing TAG-style mechanics and minimalist-style mechanics in his 2002 book didn't have anything to do with access-via-chunking or predictive processing or anything like that -- it was purely motivated by the fact that minimalist-style mechanics seem to do a good job of dealing with the bounded, relatively local dependencies (approximately, things that were within a kernel sentence), whereas TAG-style mechanics seem to do a good job of dealing with the longer-distance dependencies (approximately, things that had involved generalized transformations right from the very early days). His neat observation was that minimalism's (independently motivated) move to all generalized transformations got rid of the hook on which the distinction between these two kinds of phenomena could be hung. His use of TAG is one way to try to bring back distinct mechanisms for working at the larger scale; phases are another.

    1. Cedric Boeckx reinforced to me this notion as well - that Minimalism is still very "lexicalist". This point is well-taken. But I also want to keep in mind that certain facts that are currently analyzed as grammatical or syntactic might be re-analyzed as processing facts once we start taking the idea of treelets in processing more seriously. I have mentioned occasionally the idea that this can be done for islands. Along these lines, perhaps we can eliminate subcategorization information from the lexicon if we assume that the facts that subcategorization accounts for can be attributed to the way that treelets are used in processing.

      The following is extremely speculative. One way of analyzing this is that the core syntactic operations can put verbs together with complements however it wants and do not care about subcategorization. However it turns out that in communication of particular verbs, it becomes useful to have the automatic retrieval of particular complement treelets when encountering these verbs. In the adult grammar, not getting those complements means that the sentences feel bad.

      I feel like I'm simply restating what it means to have vagaries of particular words encoded "in the lexicon". I'm replacing the lexicon with properties of the memory/treelet system associated with online processing, if lexicon is meant to be the ultimate primitives of the syntax.

    2. > But I also want to keep in mind that certain facts that are currently
      > analyzed as grammatical or syntactic might be re-analyzed as
      > processing facts
      I think that this is a very reasonable enterprise.

      > once we start taking the idea of treelets in processing
      > more seriously.
      I think that this is also very reasonable. However, there are a number of ways to do this. My impression of the Demberg approach is that she wants to reify 'treelets in processing' in the grammatical model itself. One way to think of her PLTAG is that she identifies something like a top down parsing strategy for TAGs, and then redefines the TAG framework so that the objects of the new grammar, when parsed bottom-up, mimic the top-down parsing strategy. This sort of approach is well-known in the computer science literature, where, for example, a left corner parser can be thought of as a top-down parse of a left-corner transformed grammar.

      An alternative is to find your notion of treelet, not in the grammar, but rather in the action of the parser. So when you prime a treelet, you are not adding some weight to a lexical item, but rather to a sequence of parsing steps. Here, the grammar specifies the compositional structure of linguistic expressions, and the parser specifies how this structure is inferred for percepts on the fly.

    3. A bit more needs to be said to make the argument that syntactic priming reflects memoization of parser steps as opposed to activation of constructions or rewrite rules - my understanding is that we have both comprehension-to-comprehension and comprehension-to-production priming, and their magnitude appears to be similar (e.g. Tooley and Bock 2014, On the parity of structural persistence in language production and comprehension).

    4. Greg's saying that treelets are part of the parser, not the grammar. I totally agree with this statement - I want to make the claim that treelets are part of the faculty of language, but are not part of the grammar. So my notion of treelets I think is equivalent to the notion of "memorization" of parser steps - they are the same thing.

      See Shota 2016 (dissertation) for an excellent argument that the processing system involved in language production and comprehension is the same.

  5. Thanks to William Matchin for a fascinating article

    I have a small concern regarding an example given of a potential application of a linguistic-neurologic theory. It was mentioned that perhaps islands could be treated as a grammaticalized processing constraint. This seems to me to be stretching the domain of grammaticalization a little far. I may be quite wrong, but my understanding is that For something to be grammaticalized it would need to be manifest in some way in the grammar of a language itself. Furthermore, it would need to be a notion (either syntactic or semantic) that would lend itself to interpretation. So Plurality can be grammaticalized as 's' in english or as 'e/en' in German because plurality is a semantically interpretable notion. Or structural case is grammaticalized as a way of signaling displacement. What exactly would be the nature of the processing constraint if it were to be grammaticalized. something would need to be refined either about the notion of processing constraint or of grammaticalization.
    just my two cents

    1. Thanks for the input. I agree with you that this is the standard notion of talking about grammaticalization that is somewhat different from my intentions with this paper. My intention was to attempt to capture the intuition behind those who think that islands are the result of grammaticalization of processing constraints (e.g., Berwick & Weinberg). I tried to capture this intuition by the following. Online processing makes heavy use of a set of complex structures stored in long-term memory that were set during language acquisition, and the nature of these stored structures is an interaction between the constraints of grammar (i.e., Merge and lexicon) and processing considerations (e.g., prediction of syntactic content, minimization of working memory demands, facilitation of incremental interpretation). I assume islands are not frequent visitors in the child's input. I then propose that this set of treelets and memory retrieval operations do not allow the processing of (at least some) islands.

      If this is not grammaticalization, that's fine by me - what's more important is that island constraints are not reduced to processing issues as appears to be empirically false (e.g. Sprouse et al., 2012), but rather that island phenomena result from the form of the language faculty of the adult speaker-hearer, whose form is determined in part by processing constraints.