Faculty of Language: Universal tendencies

Monday, October 24, 2016

Universal tendencies

Let’s say we find two languages displaying a common pattern, or two languages converging towards a common pattern, or even all languages doing the same. How should we explain this? Stephen Anderson (here, and discussed by Haspelmath here) notes that if you are a GGer there are three available options: (i) the nature of the input, (ii) the learning theory and (iii) the cognitive limits of the LAD (be they linguistically specific or domain general). Note that (ii) will include (iii) as a subpart and will have to reflect the properties of (i) but will also include all sorts other features (cognitive control, structure of memory and attention, the number of options the LAD considers at one time etc.). These, as Anderson notes, are the only options available to a GGer for s/he takes G change to reflect the changing distribution of Gs in the heads of a population of speakers. Or, to put this more provocatively: languages don't exist apart from their incarnation in speakers’ minds/brains. And given this, all diachronic “laws” (laws that explain how languages or Gs change over time) must reflect the cognitive, linguistic or computational properties of human minds/brains.

This said, Haspelmath (H) observes (here and here) (correctly in my view) that GGers have long “preferred purely synchronic ways of explaining typological distributions,” and by this he means explanations that allude to properties of the “innate Language Faculty” (see here for discussion). In other words, GGers like to think that typological differences reflect intrinsic properties of FL/UG and that studying patterns of variation will hence shed light on its properties. I have voiced some skepticism concerning this “hence” here. In what follows I would like to comment on H’s remarks on a similar topic. However, before I get into details I should note that we might not be talking about the same thing. Here’s what I mean.

The way I understand it, FL/UG bears on properties of Gs not on properties of their outputs. Hence, when I look at typology I am asking how variation in typologies and historical change might explain changes in Gs. Of course, I use outputs of these Gs to try to discern the properties of the underlying Gs, but what I am interested in is G variation not output variation. This concedes that one might achieve similar (identical?) outputs from different congeries of G rules, operations and filters. In effect, whereas changing surface patterns do signal some change in the underlying Gs, similarity of surface patterns need not. Moreover, given our current accounts there is (sadly) too many roads to Rome, thus the fact that two Gs generate similar outputs (or have moved towards similar outputs from different Gish starting points) does not imply that they must be doing so in the same way. Maybe they are and maybe not. It really all depends.

Ok back to H. He is largely interested in the (apparent) fact (and let’s stipulate that H is correct) that there exist “recurrent paths of changes,” “near universal tendencies” (NUT) that apply in “all or a great majority of languages.”[1] He is somewhat skeptical that we have currently identified diachronic mechanisms to explain such changes and that those on the market do not deliver: “It seems clear to me that in order to explain universal tendencies one needs to appeal to something stronger than “common paths of change,” namely change constraints, or, mutational constraints…” I could not agree more. That there exist recurrent paths of change is a datum that we need mechanisms to explain. It is not yet a complete explanation. Huh?

Recall, we need to keep our questions clear. Say that we have identified an actual NUT (i.e. we have compelling evidence that certain kinds of G changes are “preferred”). If we have this and we find another G changing in the same direction then we can attribute this to that same NUT. So we explain the change by so attributing it. Well, in part: we have identified the kind of thing it is even if we do not yet know why these types of things exist. An analogy: I have a pencil in my hand. I open it. The pencil falls. Why? Gravitational attraction. I then find out that the same thing happens when I have a pen, an eraser, a piece of chalk (yes, this horse is good and dead!) and any other school supply at hand. I conclude that these falls are all instances of the same causal power (i.e. gravity). Have I explained why when I pick up a thumbtack and let it loose and it too falls that it falls because of gravity? Well, up to a point. A small point IMO, but a point nonetheless. Of course we want to know how Gravity does this, what exactly it does when it does it and even why it does is the way that it does, but classifying phenomena into various explanatory pots is often a vital step in setting up the next step of the investigation (viz. identifying and explaining the properties of the alleged underlying “force”).

This said, I agree that the explanation is pretty lame if left like this. Why did X fall when I dropped it? Because everything falls when you drop it. Satisfied? I hope not.

Sadly, from where I sit, many explanations of typological difference or diachronic change have this flavor. In GG we often identify a parameter that has switched value and (more rarely) some PLD that might have led to the switch. This is devilishly hard to do right and I am not dissing this kind of work. However, it is often very unsatisfying given how easy it is to postulate parameters for any observable difference. Moreover, very few proposals actually do the hard work of sketching the presupposed learning theory that would drive the change or looking at the distribution of PLD that the learning theory would evaluate in making the change. To get beyond the weak explanations noted above, we need more robust accounts of the nature of the learning mechanisms and the data that was input to it (PLD) that led to the change.[2] Absent this, we do have an explanation of a very weak sort.

Would H agree? I think so, but I am not absolutely sure of this. I think that H runs together things that I would keep separate. For example: H considers Anderson’s view that many synchronic features of a G are best seen as remnants of earlier patterns. In other words, what we see in particular Gs might be reflections of “the shaping effects of history” and “not because the nature of the Language Faculty requires it” (H quoting Anderson: p. 2). H rejects this for the following reason: he doesn’t see “how the historical developments can have “shaping effects” if they are “contingent” (p. 2). But why not? What does the fact that something is contingent have to do with whether it can be systematically causal? 1066 and all that was contingent, yet its effects on “English” Gs has been long lasting. There is no reason to think that contingent events cannot have long lasting shaping effects.

Nor, so far as I can tell, is there reason to think that this only holds for G-particular “idiosyncrasies.” There is no reason in principle why historical contingencies might not explain “universal tendencies.” Here’s what I mean.

Let’s for the sake of argument assume that there are around 50 different parameters (and this number is surely small). This gives a space of possible Gs (assuming the parameters are independent) of about 1,510,000,000. The current estimate of different languages out there (and I assume, maybe incorrectly, Gs) is on the order of 7,000, at least that’s the number I hear bandied about among typologists. This number is miniscule. It covers .0005% of the possible space. It is not inconceivable that languages in this part of the space have many properties in common purely because they are all in the same part of the space. These common properties would be contingent in a UG sense if we assumed that we only accidentally occupy this part of the space. Or, had we been dropped into another part of the G space we would have developed Gs without these properties. It is even possible that it is hard to get to any other of the G possibilities given that we are in this region. On this sort of account, there might be many apparent universals that have no deep cognitive grounding and are nonetheless pervasive. Don’t get me wrong, I am not saying these exist, only that we really have no knock down reason for thinking they do not. And if something like this could be true, then the fact that some property did or didn’t occur in every G could be attributed to the nature of the kind of PLD our part of the G space makes available (or how this kind of PLD interacts with the learning algorithm). This would fit with Anderson’s view: contingent yet systematic and attributable to the properties of the PLD plus learning theory.

I don’t think that H (nor most linguists) would find this possibility compelling. If something is absent from 7,000 languages (7,000 I tell you!!!) then this could not be an accident! Well maybe not. My only claim is that the basis for this confidence is not particularly clear. And thinking through this scenario makes it clear that gaps in the existing language patterns/Gs are (at best) suggestive about FL/UG properties rather than strongly dispositive. It could be our ambient PLD that is responsible. We need to see the reasoning. Culbertson and Adger provide a nice model for how this might be done (see here).

One last point: what makes PoS arguments powerful is that they are not subject to this kind of sampling skepticism. PoS arguments really do, if successful, shed direct light on FL/UG. Why? Because if correctly grounded PoSs abstract away from PLD altogether and so remove this as a causal source of systematicity. Hence, PoSs short-circuit the skeptical suggestions above. Of course, the two kinds of investigation can be combined However, it is worth keeping in mind that typological investigations will always suffer from the kind of sampling problem noted above and will thus be less direct probes of FL/UG than will PoS considerations. This suggests, IMO, that it would be very good practice to supplement typologically based conclusions with PoS style arguments.[3] Even better would be explicit learning models, though these will be far more demanding given how hard it likely is to settle on what the PLD is for any historical change.[4]

I found H’s discussion of these matters to be interesting and provocative. I disagree with many things that H says (he really is focused on languages rather than Gs). Nonetheless, his discussion can be translated well enough into my own favored terms to be worth thinking about. Take a look.

[1] I say ‘apparent’ for I know very little of this literature though I am willing to assume H is correct that these exist for the sake of argument.

[2] Which does not mean that we have nice models of what better accounts might look like. Bob Berwick, Elan Dresher, Janet Fodor, Jeff Lidz, Lisa Pearl, William Sakas, Charles Yang, a.o., have provided excellent models of what such explanations would look like.

[3] Again a nice example of this is Culbertson and Adger’s work discussed here. It develops an artificial G argument (meatier than a simple PoS argument) to more firmly ground a typological conclusion.

[4] Hard, but not impossible as the work of Kroch, Lightfoot and Roberts, for example, shows.

11 comments:

UnknownOctober 25, 2016 at 12:00 AM
Are not all parameters binary? In that case would not the number of possible Gs be 2^50, (i.e., more than 10^15)?
ReplyDelete
Replies
UnknownOctober 25, 2016 at 9:32 AM
Are you saying that the FoL may allow Gs that have/lack certain properties that none of the languages we have access to actually instantiate?
ReplyDelete
Replies
OmerOctober 25, 2016 at 11:11 AM
I agree that something that's absent from the 6000-7000 languages we see is not in principle guaranteed to be ruled out by our mental capacities (be they linguistic or otherwise). This is a methodological heuristic, and I think it's one that has served us well. I personally find this heuristic to be much more reasonable than the one that underlies work in the artificial grammar paradigm – namely, that how adults treat novel linguistic data is relevant to how children do so.

(And I agree with the first commenter regarding the math: 2^50 is 1,125,899,906,842,624.)

ReplyDelete
Replies
UnknownOctober 26, 2016 at 12:41 AM
How much does the approximation of number of parameters vary? Is there not a risk of circularity if the parameters are derived from how the Gs of the languages of the world look like, and then the space of possible Gs is approximated from this number of parameters? Or am I missing something?
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Monday, October 24, 2016

Universal tendencies

11 comments:

Contributors