Sunday, February 12, 2017

Strings and sets

I have argued repeatedly that the Minimalist Program (MP) should be understood as subsuming earlier theoretical results rather than replacing them. I still like this way of understanding the place of MP in the history of GG, but there is something misleading about it if taken too literally. Not wrong exactly, but misleading. Let me explain.

IMO, MP is to GB (my favorite exemplar of an earlier theory) as Bounding Theory is to Ross’s Islands. Bounding Theory takes as given that Ross’s account of islands is more or less correct and then tries to derive these truths from more fundamental assumptions.[1] Thus, in one important sense, Bounding Theory does not substitute for Ross’s but aims to explain it. Thus, Bounding Theory aims to conserve the results of Ross’s theory more or less.[2] 

Just as accurately, however, Bounding Theory does substitute for Ross’s. How so? It conserves but does not recapitulate it. Rather it explains why the things on Ross’s list are there. Furthermore, if successful it will add other islands to Ross’s inventory (e.g. Subject Condition effects) and make predictions that Ross’s did not (e.g. successive cyclicity). So conceived, Ross’s island are explanada for which Bounding Theory is the explanans.

Note, and this is important, given this logic Bounding Theory will inherit any (empirical) problems for Ross’s generalizations. Pari passu for GB and MP. I mention this not because it is the topic of todays sermonette, but just to observe that many fail to appreciate this when criticizing MP. Here’s what I mean.

One way MP might fail is in adopting the assumption that GBish generalizations are more or less accurate. If this assumption is incorrect, then the MP story fails in its presuppositions. And as all good semanticists know, this is different from failing in one’s assertions. Failing this way makes you not so much wrong as uninteresting. And MP is interesting, just as Bounding Theory is interesting, to the degree that what it presupposes is (at least) on the right track.[3]

All of this is by way of (leisurely) introduction to what I want to talk about below. Of the changes MP has suggested I believe that most (or, to be mealy mouthed, one of the most) fundamental has been the proposal that we banish strings as fundamental units of grammar. This shift has been long in coming, but one way of thinking about Chomsky’s set theoretic conception of Merge is that it dislodges concatenation as the ontologically (and conceptually) fundamental grammatical relation. Let me flesh this out a bit.

The earliest conception of GG took strings as fundamental, strings just being a series of concatenated elements. In Syntactic Structures (SS) (and LSLT for which SS was a public relations brochure) kernel sentences were defined as concatenated objects generated by PS rules. Structural Descriptions took strings as inputs and delivered strings (i.e. Structural Changes) as outputs (that’s what the little glide symbol (which I can’t find to insert) connecting expressions meant). Thus, for example, a typical rule took as input things like (1) and delivered changes like (2), the ‘^’ representing concatenation. PS rules are sets of such strings and transformations are sets of sets of such strings. But the architecture bottoms out in strings and their concatenative structures.[4]

(1)  X^DP1^Y^V^DP2^Z
(2)  X^DP2^Y^V+en^by^NP1

This all goes away in merge based versions of MP.[5] Here phrase markers (PM) are sets, not strings and string properties arise via linearization operations like Kayne’s LCA which maps a given set into a linearized string. The important point is that sets are what the basic syntactic operation generates, string properties being non-syntactic properties that only obtain when the syntax is done with its work.[6] It’s what you get as the true linguistic objects, the sets, get mapped to the articulators. This is a departure from earlier conceptions of grammatical ontology.

This said it’s an idea with many precursors. Howard Lasnik has a terrific little paper on this in the Aspects 50 years later (Gallego and Ott eds, a MITWPL product that you can download here). He reviews the history and notes that Chomsky was quite resistant in Aspects to treating PMs as just coding for hierarchical relationships, an idea that James McCawley, among others, had been toying with. Howard reviews Chomsky’s reasoning and highlights several important points that I would like to quickly touch on here (but read the paper, it’s short and very very sweet!).

He notes several things. First, that one of the key arguments for his revised conception in Aspects revolved around eliminating some possible but non-attested derivations (see p. 170). Interestingly, as Howard notes, these options were eliminated in any theory that embodied cyclicity. This is important for when minimalist Chomsky returns to Generalized Transformations as the source of recursion, he parries the problems he noted in Aspects by incorporating a cyclic principle (viz. the Extension Condition) as part of the definition of Merge.[7]

Second, X’ theory was an important way station in separating out hierarchical dependencies from linear ones in that they argued against PS rules in Gs. By dumping PS rules, the relation between such rules and the string features of Gs was conceptually weakened.

Despite this last point, Lasnik’s paper highlights the Aspects arguments against set based conception of phrase structure (i.e in favor of retaining string properties in PS rules). This is section 3 of Howard’s paper. It is a curious read for a thoroughly modern minimalist for in Aspects we have Chomsky arguing that it is a very bad idea to eliminate linear properties from the grammar as was being proposed, by among others, James McCawley. Uncharacteristically (and I mean this is a compliment), Chomsky’s reasoning here is largely empirical. Aspects argues that when one looks, the Gs of the period, presupposed some conception of underlying order in order to get the empirical facts to fit and that this presupposition fits very poorly with a set theoretic conception of PMs (see Aspects: 123-127). The whole discussion is interesting, especially the discussion of free word order languages and scrambling. The basic observation is the following (126):

In every known language the restrictions on order [even in scrambling languages, NH] are quite severe, and therefore rules of realization of abstract structures are necessary. Until some account of such rules is suggested, the set-system simply cannot be considered seriously as a theory of grammar.

Lasnik, argues plausibly, that Kayne’s LCA offered such an account and removed this empirical objection against eliminating string information from basic syntactic PMs.

This may be so. However, from my reading of things I suspect that something else was at stake. Chomsky has not, on my reading, been a huge fan of the LCA, at least not in its full Kaynian generality (see note 6). As Howard observes, what he has been a very big fan of is the observation, going back at least to Reinhart, that, as he says in the Black Book (334), “[t]here is no clear evidence that order plays a role at LF or in the computation from N [numeration, NH] to LF.”

Chomsky’s reasoning is Reinhart’s on steroids. What I mean is that Reinhart’s observations, if memory serves, are largely descriptive, noting that anaphora is largely insensitive to order and that c-command is all that matters in establishing anaphoric dependencies (an important observation to be sure and one that took some subtle argumentation to establish).[8] Chomsky’s observations go beyond this in being about the implications of such lacunae for a theory of generative procedures. What’s important wrt to linear properties and Gs is not whether linearized order plays a discernible role in languages, of course it does, but whether these properties tell us anything about generative procedures (i.e. whether linear properties are factors in how generative procedures operate). This is key. And Chomsky’s big claim is that G operations are exclusively structure dependent, that this fact about Gs needs to be explained and that the best explanation is that Gs have no capacity to exploit string properties at all. This builds on Reinhart, but is really making a theoretical point about the kinds of rules/operations Gs contain rather than a high level observation about antecedence relations and what licenses them.

So, the absence of linear sensitive operations in the “core” syntax, the mapping from lexical items to “LF” (CI actually, but I am talking informally here) rather than some way of handling the evident linear properties of language, is the key thing that needs explanation.

This is vintage Chomsky reasoning: look for the dogs that aren’t barking and give a principled explanation for why they are not barking. Why no barking strings? Well, if PMs are sets then we expect Gs to be unable to reference linear properties and thus such information should be unable to condition the generative procedures we find in Gs.

Note that this argument has been a cynosure of Chomsky’s most recent thoughts on structure dependence as well. He reiterates his long-standing observation that T to C movement is structure dependent and that no language has a linear dependent analogue (move the “highest” Aux exists but move the “left-most” aux never does and is in fact never considered an option by kids building English Gs). He then goes on to explain why  no G exploit such linear sensitive rules. It’s because the rule writing format for Gs exploits sets and sets contain no linear information. As such rules that exploit linear information cannot exist for the information required to write them is un-codeable in the set theoretic “machine language” available for representing structure. In other words, we want sets because the (core) rules of G systematically ignore string properties and this is easily explained if such properties are not part of the G apparatus.

Observe, btw, that it is a short step from this observation to the idea that linguistic objects are pairings of meanings with sounds (the latter a decidedly secondary feature) rather than a pairing of meanings and sounds (where both interfaces are equally critical). These, as you all know, serve as the start of Chomsky’s argument against communication based conceptions of grammar. So eschewing string properties leads to computational rather than communicative conceptions of FL.

The idea that strings are fundamental to Gs has a long and illustrious history. There is no doubt that empirically word order matters for acceptability and that languages tolerate only a small number of the possible linear permutations. Thus, in some sense, epistemologically speaking, the linear properties of lexical objects is more readily available (i.e. epistemologically simpler) than their hierarchical ones. If one assumes that ontology should follow epistemology or if one is particularly impressed with what one “sees,” then taking strings as basic is hard to resist (and as Lasnik noted, Chomsky did not resist it in his young foolish salad days). In fact, if one looks at Chomsky’s reasoning, strings are discounted not because string properties do not hold (they obviously do) but because the internal mechanics of Gs fails to exploit a class of logically possible operations. This is vintage Chomsky reasoning: look not at what exists, but what doesn’t. Negative data tells us about the structure of particular Gs. Negative G-rules tells us about the nature of UG. Want a pithy methodological precept? Try this: forget the epistemology, or what is sitting there before your eyes, and look at what you never see.

Normally, I would now draw some anti Empiricist methodological morals from all of this, but this time round I will leave it as an exercise for the reader. Suffice it for now to note that it’s those non-barking dogs that tell us the most about grammatical fundamentals.

[1] Again, our friends in physics make an analogous distinction between effective theories (those that are more or less empirically accurate) and fundamental theories (those that are conceptually well grounded). Effective theory is what fundamental theory aims to explain. Using this terminology, Newton’s theory of gravitation as the effective theory that Einstein’s theory of General Relativity derived as a limit case.
[2] Note that conserving the results of earlier inquiry is what allows for the accumulation of knowledge. There is a bad meme out there that linguistics in general (and syntax in particular) “changes” every 5 years and that there are no stable results. This is hogwash. However, the misunderstanding is fed by the inability to appreciate that older theories can be subsumed as special cases by newer ones.  IMO, this has been how syntactic theory has generally progressed, as any half decent Whig history would make clear. See one such starting here and continuing for 4 or 5 subsequent posts.
[3] I am not sure that I would actually strongly endorse this claim as I believe that even failures can be illuminating and that even theories with obvious presuppositional failures can point in the right direction. That said, if one’s aim is “the truth” then a presupposition failure will at best be judged suggestive rather than correct.
[4] For those that care, I proposed concatenation as a primitive here, but it was a very different sense of concatentation, a very misleading sense.  I abstracted the operation from string properties. Given the close intended relation between concatenation and strings, this was not a wise move and I hereby apologize.
[5] I have a review of Merge and its set like properties in this forthcoming volume for those that are interested.
[6] One important difference between Kayne’s and Chomsky’s views of linearization is that the LCA is internal to the syntax for the former but is part of the mapping from the syntax proper to the AP interface for the latter. For Kayne, LCA has an effect on LF and derives the basic features of X’ syntax. Not so for Chomsky. Thus, in a sense, linear properties are in the syntax for Kayne but decidedly outside it for Chomsky.
[7] The SS/LSLT version of the embedding transformation was decidedly not cyclic (or at least not monotonic structurally). Note, that other conceptions of cyclicity would serve as well, Extension being sufficient, but not necessary.
[8] It’s also not obviously correct. Linear order plays some role in making antecedence possible (think WCO effects) and this is surely true in discourse anaphora. That said, it appears that in Binding Theory proper, c-command (more or less), rather than precedence, is what counts.

Thursday, February 9, 2017

A short note on instrumentalism in linguistics

This note is not mine, but one that Dan Milway sent me (here). He blogged about instrumentalism as the guiding philo of science position in linguistics and argues that adopting it fervently is misguided. I agree. I would actually go farther and question whether instrumentalism is ever a reasonable position to hold. I tend to be realist in my scientific convictions thinking that my theories aim to describe real natural objects and that the aim of data collection is to illuminate the structure of these real objects. I think that this is the default view in physics and IMO what's good enough for physicists is good enough for me (when I can aim that high) so it is my default view in ling.

Dan's view is more nuanced and I believe you will enjoy reacting to it (or not).

Saturday, February 4, 2017

Gallistel rules

There is still quite a bit of skepticism in the cog-neuro community about linguistic representations and their implications for linguistically dedicated grammar specific nativist components. This skepticism is largely fuelled, IMO, by associationist-connectionist (AC) prejudices steeped in a nihilistic Empiricist brew.  Chomsky and Fodor and Gallistel have decisively debunked the relevance of AC models of cognition, but these ideas are very very very (very…) hard to dispel. It often seems as if Lila Gleitman was correct when she mooted the possibility that Empiricism is hard wired in and deeply encapsulated, thus impervious to empirical refutation. Even as we speak the default view in cog-neuro is ACish and that there is a general consensus in the cog-neuro community that the kind of representations that linguists claim to have discovered just cannot be right for the simple reason that the brain simply cannot embody them.

Gallistel and Matzel (see here) have deftly explored this unholy alliance between associationist psych and connectionist neuro that anchors the conventional wisdom. Interestingly, this anti representationalist skepticism is not restricted to the cog-neuro of language. Indeed, the Empiricist AC view of minds and brains has over the years permeated work on perception and it has generated skepticism concerning mental (visual) maps and their cog-neuro legitimacy.  This is currently quite funny for over the last several years Nobel committees have been falling all over themselves in a rush to award prizes to scientists for the discovery of neural mental maps. These awards are well deserved, no doubt, but what is curious is how long it’s taken the cog-neuro community to admit mental maps as legit hypotheses worthy of recognition.  For a long time, there was quite a bit of excellent behavioral evidence for their existence, but the combo of associationist dogma linked to Hebbian neuro made the cog-neuro community skeptical that anything like this could be so. Boy were they wrong and, in retrospect, boy was this dumb, big time dumb!

Here is a short popular paper (By Kate Jeffery) that goes over some of the relevant history. It traces the resistance to the very idea of mental maps stemming from AC preconceptions. Interestingly, the resistance was both to the behavioral evidence in favor of these (the author discusses Tolman’s work in the late 40s. Here’s a quote (5):

Tolman, however, discovered that rats were able to do things in mazes that they shouldn’t be able to do according to Behaviourism. They could figure out shortcuts and detours, for example, even if they hadn’t learned about these. How could they possibly do this? Tolman was convinced animals must have something like a map in their brains, which he called a ‘cognitive map’, otherwise their ability to discover shortcuts would make no sense. Behaviourists were skeptical. Some years later, when O’Keefe and Nadel laid out in detail why they thought the hippocampus might be Tolman’s cognitive map, scientists were still skeptical.

Why the resistance? Well ACism prevented conceiving of the possibility.  Here’s how Jeffery put it (5-6).

One of the difficulties was that nobody could imagine what a map in the brain would be like. Representing associations between simple things, such as bells and food, is one thing; but how to represent places? This seemed to require the mystical unseen internal ‘black box’ processes (thought and imagination) that Behaviourists had worked so hard to eradicate from their theories. Opponents of the cognitive map theory suggested that what place cells reveal about the brain is not a map, so much as a remarkable capacity to associate together complex sensations such as images, smells and textures, which all happen to come together at a place but aren’t in themselves spatial.

Note that the problem was not the absence of evidence for the position. Tolman presented lots of good evidence. And O’Keefe/Nadel presented more (in fact enough more to get the Nobel prize for the work). Rather the problem was that none of this made sense in an AC framework so the Tolman-O’Keefe/Nadel theory just could not be right, evidence be damned.[1]

What’s the evidence that such maps exist? It involves finding mental circuits that represent spatial metrics, allowing for the calculation of metric inferences (where something is and how it is from where you are). The two kinds of work that have been awarded Nobels involve place cells and grid cells. The former involve the coding of direction, the latter coding distance. The article does a nice job of describing what this involves, so I won’t go into it here.  Suffice it to say, that it appears that Kant (a big deal Rationalist in case you were wondering) was right on target and we now have good evidence for the existence of neural circuits that would serve as brain mechanisms for embodying Kant’s idea that space is a hard wired part of our mental/neural life. 

Ok, I cannot resist. Jeffery nicely outlines he challenge that these discoveries pose for ACism. Here’s another quote concerning grid cells (the most recent mental map Nobel here) and how badly it fits with AC dogma (8):[2]

The importance of grid cells lies in the apparently minor detail that the patches of firing (called ‘firing fields’) produced by the cells are evenly spaced. That this makes a pretty pattern is nice, but not so important in itself – what is startling is that the cell somehow ‘knows’ how far (say) 30 cm is – it must do, or it wouldn’t be able to fire in correctly spaced places. This even spacing of firing fields is something that couldn’t possibly have arisen from building up a web of stimulus associations over the life of the animal, because 30 cm (or whatever) isn’t an intrinsic property of most environments, and therefore can’t come through the senses – it must come from inside the rat, through some distance-measuring capability such as counting footsteps, or measuring the speed with which the world flows past the senses. In other words, metric information is inherent in the brain, wired into the grid cells as it were, regardless of its prior experience. This was a surprising and dramatic discovery. Studies of other animals, including humans, have revealed place, head direction and grid cells in these species too, so this seems to be a general (and thus important) phenomenon and not just a strange quirk of the lab rat.

As readers of FL know, this is a point that Gallistel and colleagues have been making for quite a while now and every day the evidence for neural mechanisms that code for spatial information per se grows stronger. Here is another very recent addition to the list, one that directly relates to the idea that dead-reckoning involves path integration. A recent Science paper (here) reports the discovery of neurons tuned to vector properties. Here’s how the abstract reports the findings:

To navigate, animals need to represent not only their own position and orientation, but also the location of their goal. Neural representations of an animal’s own position and orientation have been extensively studied. However, it is unknown how navigational goals are encoded in the brain. We recorded from hippocampal CA1 neurons of bats flying in complex trajectories toward a spatial goal. We discovered a subpopulation of neurons with angular tuning to the goal direction. Many of these neurons were tuned to an occluded goal, suggesting that goal-direction representation is memory-based. We also found cells that encoded the distance to the goal, often in conjunction with goal direction. The goal- direction and goal-distance signals make up a vectorial representation of spatial goals, suggesting a previously unrecognized neuronal mechanism for goal-directed navigation.

So, like place and distance, some brains have the wherewithal to subserve vector representations (goal direction and distance). Moreover, this information is coded by single neurons (not nets) and is available in memory representations, not merely for coding sensory input. As the paper notes, this is just the kind of circuitry relevant to “the vector-based navigation strategies described for many species, from insects to humans (14–19)— suggesting a previously unrecognized mechanism for goal-directed navigation across species” (5).

So, a whole series of neurons tuned to abstracta like place, distance, goal, angle of rotation, and magnitude that plausibly subserve the behavior that has long been noted implicates just such neural circuits. Once again, the neuroscience is finally catching up with the cognitive science. As with parents, the more neuro science matures the smarter classical cognitive science becomes.
Let me emphasize this point, one that Gallistel has forcefully made but is worth repeating at every opportunity until we can cleanly chop off the Empiricist zombie’s head. Cognitive data gets too little respect in the cog-neuro world. But in those areas where real progress has been made, we repeatedly find that the cog theories remain intact even as the neural ones change dramatically. And not only cog-neuro theories. The same holds for the relation of chemistry to physics (as Chomsky noted) and genetics to biochemistry (as Gallistel has observed). It seems that more often than not what needs changing is the substrate theory not the reduced theory. The same scenario is being repeated again in the cog-neuro world. We actually know very little about brain hardware circuitry and we should stop assuming that ACish ideas should be given default status when we consider ways of unifying cognition with neuroscience.

Consider one more interesting paper that hits a Gallistel theme, but from a slightly different angle. I noted that the Science paper found single neurons coding for abstract spatial (vectorial) information. There is another recent bit of work (here) that ran across my desk[3] that is also has a high Gallistel-Intriguing (GI) index.

It appears that slime molds can both acquire info about their environment and can pass this info on to other slime molds. What’s interesting is that these slime molds are unicellular, thus the idea that learning in slime molds amounts to fine tuning a neural net cannot be correct. Thus whatever learning is in this case must be intra, not inter-neural.  And this supports the idea that one has intra cellular cognitive computations. Furthermore, when slime molds “fuse” (which they apparently can do, and do do) the information that an informed slime mold has can transfer to its fused partner. This supports the idea that learning can be a function of the changed internal state of a uni-cellular organism.
This is clearly grist for the Gallistel-King conjecture (see here for some discussion) that (some) learning is neuron, not net, based. The arguments that Gallistel has given over the years for this view have been both subtle, abstract and quite arm-chair (and I mean this as a compliment). It seems that as time goes by, more and more data that fits this conception comes in. As Gallistel (and Fodor and Pylyshyn as well) noted, representational accounts prefer certain kinds of computer architectures over others (Turing-von Neumann architectures). These classical computer architectures, we have been told, cannot be what brains exploit. No, brains, we are told repeatedly, use nets and computation is just the Hebb rule with information stored in the strength of the inter-neuronal connections. Moreover, this information is very ACish with abstracta at best emergent, rather than endogenous features of our neural make-up. Well, this seems to be wrong. Dead wrong. And the lesson I draw form all of this is that it will prove wrong for language as well. The sooner we dispense with ACism, the sooner we will start making some serious progress. It’s nothing but a giant impediment, and has proven to be so again and again.

[1] This is a good place to remind you of the difference between Empiricist and empirical. The latter is responsiveness to evidence. The former is a theory (which, IMO, given its lack of empirical standing has become little more than a dogma).
[2] It strikes me as interesting that this sequence of events reprises what took place in studies of the immune system. Early theories of antibody formation were instructionist because how could the body natively code for so many antibodies? As work progressed, Nobel prizes streamed to those that challenged this view and proposed selectionist theories wherein the environment selected from a pre-specified innately generated list of options (see here). It seems that the less we know, the greater the appeal of environmental conceptions of the origin of structure (Empiricism being the poster child for this kind of thinking). As we come to know more, we come to understand how rich is the contribution of the internal structure of the animal to the problem at hand. Selectionism and Rationalism go hand in hand. And this appears to be true for both investigations of the body and the mind.
[3] Actually, Bill Idsardi feeds me lots of this, so thx Bill.

Tuesday, January 31, 2017

Some short reads

Here are a couple of things to read that I found amusing.

The first two concern a power play by Elsevier that seems serious. The publisher seems to be about to get into a big fight with countries about access to their journals. Nature reports that Germany, Taiwan and Peru will soon have an Elsevier embargo placed on them, the journals that it publishes no longer available to scientists in these countries. This seems to me a big deal, and I suspect that this will be a turning point in open access publishing.  However big Elsevier is, were I their consigliere, I would council not getting into fights with countries, especially ones with big scientific establishments.

There is more in fact. It seems that Elsevier is also developing its own impact numbers, ones that make its journals look better than the other numbers do (see here). Here's one great quote from the link: "seeing a publisher developing its own metrics sounds about as appropriate as Breitbart news starting an ethical index for fake news."

Embargoing countries and setting up one's own impact metric; seems like fun times.

Here is a second link that I point to just for laughs and because I am a terrible technophobe. Many of my colleagues are LaTex fans. A recent paper suggests that whatever its other virtues, LaTex is a bot of a time sink. Take a look. Here's the synopsis:
To assist the research community, we report a software usability study in which 40 researchers across different disciplines prepared scholarly texts with either Microsoft Word or LaTeX. The probe texts included simple continuous text, text with tables and subheadings, and complex text with several mathematical equations. We show that LaTeX users were slower than Word users, wrote less text in the same amount of time, and produced more typesetting, orthographical, grammatical, and formatting errors. On most measures, expert LaTeX users performed even worse than novice Word users. LaTeX users, however, more often report enjoying using their respective software.
Ok, I admit it, my schadenfreude is tingling.

A small addendum to the previous post on Syntactic Structures

Here’s a small addition to the previous post prompted by a discussion with Paul Pietroski. I am always going on about how focusing on recursion in general as the defining property of FL is misleading. The interesting feature of FL is not that it produces (or can produce) recursive Gs but that it produces the kinds of recursive Gs that it does. So the minimalist project is not to explain how recursion arises in humans but how a specific kind of recursion arises in the species. What kind? Well the kind we find in the Gs we find. What kind are these? Well not FSGs nor CFGs, or at least this is what Syntactic Structures (SS) argues.

Let me put this another way: GG has spent the last 60 years establishing that human Gs have a certain kind of recursive structure. In SS, it argued for a transformational grammar arguing that FSGs (which were recursive) were inherently too weak and that PSGs (also recursive) were inadequate empirically. Transformational Gs, SS argued, are the right fit.

So, when people claim that the minimalist problem is to explain the sources of recursion or observe that there may be/is recursion in other parts of cognition thereby claiming to “falsify” the project, it seems to me that they are barking up a red herring (I love the smell of mixed metaphors in the morning!). From where I sit, the problem is explaining how an FL that delivers TGish recursive G arose as this is the kind of FL that we have and the kinds of Gs that it delivers. SS makes clear that “in the earliest days of GG,” not all recursive Gs are created equal and that the FL and Gs of interest have specific properties. It’s the sources for this kind of recursion we want to explain. This is worth bearing in mind when issues of recursion (and its place in minimalist theory) make it to the spotlight.

Friday, January 27, 2017

On Syntactic Structures

I am in the process of co-editing a volume on Syntactic Structures (SS) that is due out in 2017 to celebrate (aka, exploit) the 60th anniversary of the publication of this seminal work. I am part of a gang of four (the other culprits being Howard Lasnik, Pritti Patel, Charles Yang supervised/inspired by Norbert Corver). We have solicited about 15 shortish pieces on various themes. The tentative title is something like The continuing relevance of Syntactic Structures to GG. Look for it on your newsstands sometime late next year. It should arrive just in time for the 2017 holidays and is sure to be a great Xmas/ Hanukka/Kwanza gift. As preparation for this editorial escapade, I have re-read SS several times and have tried to figure out for myself what its lasting contribution is. Clearly it is an important historical piece as it sparked the Generative Enterprise. The question remains: What SS ideas have current relevance? Let me mention five.

The first and most important idea centers on the aims of linguistic theory (ch 6). SS contrasts the study of grammatical form and the particular internal (empirically to be determined) “simplicity” principles that inform it with discovery procedures that are “practical and mechanical” (56) methods that “an investigator (my emph, NH) might actually use, if he had the time, to construct a grammar of the language directly from the raw data” (52). SS argues that a commitment to discovery procedures leads to strictures on grammatical analysis (e.g. bans on level mixing) that are methodologically and empirically dubious.

The discussion in SS is reminiscent of the well-known distinction in the philo of science between the context of discovery and the context of justification. How one finds one’s theory can be idiosyncratic and serendipidous, justifying one’s “choice” is another matter entirely. SS makes the same point.[1] It proposes a methodology of research in which grammatical argumentation is more or less the standard of resaoning in the sciences more generally: data plus general considerations of simplicity are deployed to argue for the superiority of one analysis over another. SS contrasts this with the far stronger strictures Structuralists endorsed, principles which if seriously practiced would sink most any serious science. In practice then, what SS is calling for is that linguists act like regular scientists (in modern parlance, reject methodological dualism).

Let me be bit more specific. The structuralism that SS was arguing against took as a methodological dictum that the aim of analysis was to classify a corpus into a hierarchy of categories conditioned by substitution criteria. So understood, grammatical categories are classes of words, which are definable as classes of morphemes, which are defniable as classes of phonemes, which are definable as classes of phones. The higher levels are, in effect, simple generalizations over lower level entities. The thought was that higher level categories were entirely reducible to lower level distributional patterns. In this sort of analysis, there are no (and can be no) theoretical entities, in the sense of real abstract constructs that have empirical consequences but are not reducible or definable in purely observational terms. By arguing against discovery procedures and in favor of evaluation metrics SS is in effect arguing for the legitimacy of theoretical linguistics. Or, more accurately, for the legitimacy of normal scientific inquiry into language without methodological constrictions that would cripple physics were it applied.

Let me put this another way: Structuralism adopted a strong Empiricist methodology in which theory was effectively a summary of observables. SS argues for the Rationalist conception of inquiry in which theory must make contact with observables, but is not (and cannot) be reduced to them.  Given that the Rationalist stance simply reflects common scientific practice, SS is a call for linguists to start treating language scientifically and not hamstring inquiry by adopting unrealistic, indeed non-scientific, dicta. This is why SS (and GG) is reasonably seen as the start of the modern science of linguistics.

Note that the discussion here in SS differs substantially from that in chapter 1 of Aspects, though there are important points of contact.[2] SS is Rationalist as concerns the research methodology of linguists. Aspects is Rationalist as concerns the structure of human mind/brains. The former concerns research methodology. The latter concerns substantive claims about human neuro-psychology.

That said there are obvious points of contact. For example, if discovery procedures fail methodologically, then this strongly suggests that they will also fail as theories of linguistic mental structures. Syntax, for example, is not reducible to properties of sound and/or meaning despite its having observable consequences for both. In other words, the Autonomy of Syntax thesis is just a step away from the rejection of discovery procedures. It amounts to the claim that syntax constitutes a viable G level that is not reducible to the primitives and operations of any other G level.

To beat this horse good and dead: Gs contain distinct levels that interact with empirically evaluable consequences, but they are not organized so that lower levels are definable in terms of generalizations over lower level entities. Syntax is real. Phonology is real. Semantics is real. Phonetics is real. These levels have their own primitives and principles of operation. The levels interact, but are ontologically autonomous. Given the modern obsession with deep learning and its implicit endorsement of discovery procedures, this point is worth reiterating and keeping in mind. The idea that Gs are just generalizations over generalizations over generalizations that seems the working hypothesis of Deep Learners and others[3] has a wide following nowadays so it is worth recalling the SS lesson that discovery procedures both don’t work and are fundamentally anti-theoretical. It is Empiricism run statistically amok!

Let me add one more point and then move on. How should we understand the SS discussion of discovery procedures from an Aspects perspective given that they are not making the same point? Put more pointedly, don’t we want to understand how a LAD (aka, kid) goes from PLD (a corpus) to a G? Isn’t this the aim of GG research? And wouldn’t such a function be a discovery procedure?

Here’s what I think: Yes and no. What I mean is that SS makes a distinction that is important to still keep in mind. Principles of FL/UG are not themselves sufficient to explain how LADs acquire Gs. More is required. Here’s a quote from SS (56):

Our ultimate aim is to provide an objective, non-intuitive way to evaluate a grammar once presented, and to compare it with other proposed grammars (equivalently, the nature of linguistic structure) and investigating the empirical consequences of adopting a certain model for linguistic structure, rather than showing how, in principle, one might have arrived at the grammar of a language.

Put in slightly more modern terms: finding FL/UG does not by itself provide a theory of how the LAD actually acquires a G. More is needed. Among other things, we need accounts of how we find phonemes, and morphemes and many of the other units of analysis the levels require. The full theory will be very complex, with lots of interacting parts. Many mental modules will no doubt be involved. Understanding that there is a peculiarly linguistic component to this story does not imply forgetting that it is not the whole story. SS makes this very clear. However, focusing on the larger problem often leads to ignoring the fundamental linguistic aspects of the problem, what SS calls the internal conditions on adequacy, many/some of which will be linguistically proprietary.[4]

So, the most important contribution of SS is that it launched the modern science of linguistics by arguing against discovery procedures (i.e. methodological dualism). And sadly, the ground that SS should have cleared is once again infested. Hence, the continuing relevance ot the SS message.

Here are four more ideas of continuing relevance.

First, SS shows that speaker intuitions are a legitimate source of linguistic data. The discussions of G adequacy in the first several chapters are all framed in terms of what speakers know about sentences. Indeed, that Gs are models of human linguistic behavior over an unbounded domain is quite explicit (15):

…a grammar mirrors the behavior of speakers who, on the basis of a finite and accidental experience with language, can produce or understand an indefinite number of new sentences. Indeed, any explication of “grammatical in L” …can be thought of as offering an explanation for this fundamental aspect of linguistic behavior.

Most of the data presented for choosing one form of G over another involves plumbing a native speaker’s sense of what is and isn’t natural for his/her language. SS has an elaborate discussion of this in chapter 8 where the virtues of “constructional homonymity” (86) as probes of grammatical adequacy are elaborated. Natural languages are replete with sentences that have the same phonological form but differ thematically (flying planes can be dangerous) or that have different phonological forms but are thematically quite similar (John hugged Mary, Mary was hugged by John). As SS notes (83): “It is reasonable to expect grammars to provide explanations for some of these facts” and for theories of grammar to be evaluated in terms of their ability to handle them.

It is worth noting that the relevance of constructional homonymity to “debates” about structure dependence has been recently highlighted once again in a paper by Berwick, Pietroski, Yankama and Chomsky (see here and here for discussion). It appears that too many forget that linguistics facts go beyond the observation that “such and such a strong…is or is not a sentence” (85). SS warns against forgetting this, and the world would be a better place (or at least dumb critiques of GG would be less thick on the ground) if this warning 60 years ago had been heeded.

Second, SS identifies the central problem of linguistics as how to relate sound and meaning (the latter being more specifically thematic roles (though this term is not used)). This places Gs and their structure at the center of the enterprise. Indeed, this is what makes constructional homonymity such an interesting probe into the structure of Gs. There is an unbounded number of these pairings and the rules that pair them (i.e. Gs) are not “visible.” This means the central problem in linguistics is determining the structure of these abstract Gs by examining their products. Most of SS exhibits how to do this and the central arguments in favor of adding transformations to the inventory of syntactic operations involve noting how transformational grammars accommodate such data in simple and natural ways.

This brings us to the third lasting contribution of SS. It makes a particular proposal concerning the kind of G natural languages embody. The right G involves Transformations (T). Finite State Gs don’t cut it, nor can simple context free PSGs. T-grammars are required. The argument against PSGs is particularly important. It is not that they cannot generate the right structures but that they cannot do so in the right way, capturing the evident generalizations that Gs embodying Ts can do.

Isolating Ts as grammatically central operations sets the stage for the next 50 years of inquiry: specifying the kinds of Ts required and figuring out how to limit them so that they don’t wildly overgenerate.

SS also proposes the model that until very recently was at the core of every GG account. Gs contained a PSG component that generated kernel sentences (which effectively specified thematic dependencies) and a T component that created further structures from these inputs. Minimalism has partially stuck to this conception. Though it has (or some versions have) collapsed PSG kinds of rules and T rules treating both as instances of Merge, minimalist theories have largely retained the distinction between operations that build thematic structure and those that do everything else. So, even though Ts and PSG rules are formally the same, thematic information (roughly the info carried by kernel sentences in SS) is the province of E-merge applications and everything else the province of I-merge applications. The divide between thematic information and all other kinds of semantic information (aka the duality of interpretation) has thus been preserved in most modern accounts.[5]

Last, SS identifies two different linguistic problems: finding a G for a particular L and finding a theory of Gs for arbitrary L. This can also be seen as explicating the notions “grammatical in L” for a given language L vs the notion of “grammatical” tout court. This important distinction survives to the present as the difference between Gs and FL/UG. SS makes it clear (at least to me) that the study of the notion grammatical in L is interesting to the degree that it serves to illuminate the more general notion grammatical for arbitrary L (i.e. Gs are interesting to the degree that they illuminate the structure of FL/UG). As a practical matter, the best route into the more general notion proceeds (at least initially) via the study of the properties of individual Gs. However, SS warns against thinking that a proper study of the more general notion must await the development of fully adequate accounts of the more specific.

Indeed, I would go further. The idea that investigations of the more general notion (e.g. of FL/UG) are parasitic on (and secondary to) establishing solid language particular Gs is to treat the more general notion (UG) as the summary (or generalization of) of properties of individual Gs. In other words, it is to treat UG as if it were a kind of Structuralist level, reducible to the properties of individual Gs. But if one rejects this conception, as the SS discussion of levels and discovery procedures suggests we should, then prioritizing G facts and investigation over UG considerations is a bad way to go.

I suspect that the above conclusion is widely appreciated in the GG community with only those committed to a Greenbergian conception of Universals dissenting. However, the logic carries over to modern minimalist investigation as well. The animus against minimalist theorizing can, IMO, be understood as reflecting the view that such airy speculation must play second fiddle to real linguistic (i.e. G based) investigations. SS reminds us that the hard problem is the abstract one and that this is the prize we need to focus on, and that it will not just solve itself if we just do concentrate on the “lower” level issues. This would hold true of the world was fundamentally Structuralist, with higher levels of analysis just being generalizations of lower levels. But SS argues repeatedly that this is not right. It is a message that we should continue to rehearse.

Ok, that’s it for now. SS is chock full of other great bits and the collection we are editing will, I am confident, bring them out. Till then, let me urge you to (re)read SS and report back on  your favorite parts. It is excellent holiday reading, especially if read responsively accompanied by some good wine.

[1] What follows uses the very helpful and clear discussion of these matters by John Collins (here): 26-7.
[2] Indeed, the view in Aspects is clearly prefigured in SS, though is not as highlighted in SS as it is later on (see discussion p. 15).
…a grammar mirrors the behavior of speakers who, on the basis of a finite and accidental experience with language, can produce or understand an indefinite number of new sentences. Indeed, any explication of “grammatical in L” …can be thought of as offering an explanation for this fundamental aspect of linguistic behavior.
[3] Elissa Newport’s work seems to be in much the same vein in treating everything as probability distributions over lower level entities bottoming out in something like syllables or phones.
[4] Of course, the ever hopeful minimalist will hope that not very much will be such.
[5] I would be remiss if I did not point out that this is precisely the assumption that the movement theory of control rejects.