Faculty of Language: Bayes Daze Translation Bleus

Monday, September 2, 2013

Bayes Daze Translation Bleus

With September on us, shadows lengthen at dusk, and thoughts turn to school, Bayes Daze II. Recall from BD I that you, dear readers, had a homework assignment: run Le pomme a mangé le garçon (‘the apple ate the boy’)[1] through French-to-English Google Translate and see what pops out on the English end. And the answer? Well, surprise! The Google sausage machine spits out, The boy ate the apple. But…Why? Well, this sin can be laid directly at the feet of Reverend Bayes. Dissecting this behavior a bit further and reverse-engineering Google Translate is a great exercise for those who aren’t familiar with the Bayes biz. (If you already know this biz, you might want to skip it all, but even so you might still find the explanation intriguing.) Don’t worry, you have nothing to fear; if you innocently violate some Google patent by figuring this out, we all know the motto “Don’t be evil.” Hah, right.

OK, so let F1 = our specific French sentence to translate, i.e., le pomme a mangé le garcon. Now, how to find the English sentence E that’s the ‘best’ translation of F1? Well, what’s ‘best’? Let’s say ‘best’ means ‘maximum probability,’ i.e., the most likely translation of F1. But what probability do we want to maximize? The simplest idea is that this should be the maximum conditional probability, p(E|F1), the probability of E given F1 – that is, let E run over all English sentences, and pick the English sentence that maximizes p(E|F1) as the ‘best translation’.[2] And it’s here that we invoke the dear Reverend, rewriting p(E|F1) via Bayes’ Rule as: p(F1|E) x p(E)/ p(F1). So, our job now is to find the particular E sentence that maximizes this new formula. Note that since F1 (le pomme a…) is fixed, that maximizing p(F1|E) x p(E)/p(F1) is the same as maximizing just its numerator p(F1|E) x p(E),[3] so we can ignore p(F1) and just maximize this product in the numerator. Why do we do this instead of just figuring out the maximum for p(E|F1) directly? Well, it’s the familiar strategy of divide-and-conquer: If we tried to maximize p(E |F1) directly, then we must have very, very, very good estimates for all these conditional probabilities. Too hard! We can get a better translation by splitting this probability into the two parts, the first p(F1|E) and the second p(E), even these two probability estimates aren’t very good individually.

How so? Well, suppose we decide to give a high likelihood value to p(F1|E) only if the words in F1 are generally translations of words in E, where the words in F1 can be in any order. Second, we assign p(E) a high likelihood if the sentence E is ‘grammatical.’ (We will say what this comes to momentarily.) Now when we put these two probability estimates together, look what happens! The factor p(F1|E) will help guarantee that a good E will have words that usually translate to the words in F1, irrespective of their order. So, if E=the boy ate has a high probability score, then so will E=the ate boy. Some word orders are good English and others aren’t. The factor p(E) has the job of lowering the probability of the ‘bad order’ sentences. As Kevin Knight puts it in his tutorial (from which I have unashamedly cribbed)[4], “p(E) worries about word order so that p(F1|E) doesn’t have to. That makes p(F1|E) easier to build than you may have thought. It only needs to say whether or not some bag of English words corresponds to a bag of French words.”

In the statistical machine translation biz, the factor p(F1|E) is known as translation model while p(E) is known as the language model. The best translation of our French sentence is found by multiplying these two probabilities together. And so now you can already probably figure out for yourself why E=the boy ate the apple wins over the apple ate the boy: both include the same unordered bag of words, {the, apple, ate, the, boy}, but, just as you’d expect, the probability of p(E)= the boy ate the apple is much, much, much greater than p(E)= the apple ate the boy, which is never found in billions of sentences of English books scanned in by Google (see below)[5]. So the language model dominates the two product factors in such examples, to the virtual exclusion of worrying at all about actual ‘translation.’ Sacre bleu! In other words, you can work as hard as you want to perfect the translation model, really sweating out the details of what English words go with and what French words, but all to no avail. You’ll still get funny examples like this one –which is really only telling you the likelihood of some particular English sentence, essentially ignoring that there is any French there at all. (This is apparently exactly what the English would like everyone to believe anyway, as verified by the famous MPHG[6] corpus.) And if you think this all just a one-off, the list of funny examples can be expanded indefinitely: run Un hippopotame me veut pour Noë1 and you’ll get I want a hippopotamus for Christmas; the German Leute stehlen mein Weißes Auto surfaces as White people stole my car, and so on.[7] But even more strikingly, if you just forget about the translation model and spend all your energies on just the language model, you wind up with a better scoring translation system – at least if one applies the metric that has been conventionally used in such bake-offs, which is called, BLEU. So, then Google Translate beats out other machine translation systems, with the best BLEU scores.

In fact, it’s worth stepping back and thinking a bit more deeply about what’s going on here. How do we actually calculate p(E)? Well, Knight continues, “People seem to be able to judge whether or not a string is English without storing a database of utterances. (Have you ever heard the sentence “I like snakes that are not poisonous”? Are you sure?). We seem to be able to break the sentence down into components. If the components are good, and if they combine in reasonable ways, then we say that the string is English…. For computers, the easiest way to break a string down into components is to consider substrings. An n-word substring is called an n-gram. If n=2, we say bigram. If n=3, we say trigram. If n=1, nerds say unigram, and normal people say word. If a string has a lot of reasonable n-grams, then maybe it is a reasonable string. Not necessarily, but maybe.”

Now you already know (in part) why Google has done all this work collecting n-grams! In fact, Google has even scanned enough books (over 5.2 million) to collect enough data to get statistical estimates for 5-grams, such as The boy ate the apple. If you go here, to Google’s n-gram viewer, and type in The boy ate the apple, along with clicking the button ‘search lots of books’ you’ll see (low) but non-zero probability estimates for this particular 5-word sentence starting about 1900. (This value is actually # of occurrences per year, normalized by the # of books printed in each year. The value’s low because there are lots of other 5-word English sentences.) But what about the probability for The apple ate the boy ? That has an estimated probability 0, since it never has turned up in any of the books in the database. In practice, since lots of 5-grams will have frequency 0, what you do is smooth the data, falling back in turn to 4-grams, trigrams, bigrams, and, if need be, single word frequencies until we find non-zero values: approximate the 5-gram The apple ate the boy as a weighted average of two 4-grams: p(the|The apple ate) and p(boy|apple ate the). (Note that this doesn’t change our Bayesian best-translation finder, since to play fair we’d have to do something comparable with The boy ate the apple.) In short, the language model is just n-grams. (There’s a lot to this smoothing biz, but that’s not our main aim here; see any recent NLP book such as Juravsky & Martin textbook. In fact, Bayesian inference generally is a kind of ‘smoothing’ method, as we’ll discuss next time.) So, the more you improve your n-gram language model, the better your translation system’s BLEU score, as Ali Mohammed demonstrates (see note 7). And for sure Google has the best (largest!) database of n-grams. The kicker is that Ali shows that what the BLEU score actually measures is simply the ability to memorize and retrieve ordered n-grams in the first place!

And you also know now what the statistically-oriented mean when they talk about a language model: it is simply any probability distribution over a set of sentences. We can estimate this by, say, collecting lots of actually uttered or written sentences. Now in general, the likelihood of any particular sentence won’t completely depend on just whether it’s ‘grammatical’ simpliciter or not in the linguist’s classical sense (though you could try to calculate it this way) – rather, it totes up how frequently that sentence was actually used in the real world, which as we all know could depend on many other factors, like the actual conversational environment, whether you are going to vote for my cousin when he runs for governor of Massachusetts next year (he made me put this shameless plug in, sorry), what you had for breakfast – anything. Now, is this state of affairs a good thing or a bad thing? Well, you tell me. If one just goes around and counts up all the butterflies in the world, is that likely to arrive at a good ‘model for butterflies’? On the other hand, for many (most/all) practical tasks, it’s proved difficult to beat trigrams – 3 word sequence frequencies – as a language model. That’s an interesting fact to ponder, because you may recall that on the minimalist account, nothing about linear order matters (only hierarchical structure, what Noam called in Geneva the ‘Basic Property,’ matters) while on the trigram account, everything about (local) linear order matters, and word triplets don’t give a hoot about the ‘Basic Property.’ Perhaps this is just some reflection of the collapse of local hierarchical structure onto the local linear sound stream. Whatever. In any case, one thing’s for sure: if you’re after some notion of “E-language” (as in, “extensional” or “external”), you simply can’t get any more “E” than this kind of language model: because what you hear and read (or scan) is E-xactly what you get.

Well, a glance at my watch says that I’ve run out of time for today (cocktail hour’s long past, and Labor Day beckons), so I’ll return to the main theme of this thread – Baze III – in a daze or two, a meditation on linguistic origins and a relationship to traditional linguistic concerns; Bayesian inference as one among many possible smoothing methods; and even Bayes as a form of S-R reinforcement learning. Until next time then: “Q: What’s a Bayesian? A: Someone who expects a rabbit, sees a duck, and concludes a platypus.”

[1]Yes, yes, I know the HW assignment was to translate, le pomme mange le garcon. I changed the example here to make the explanation a bit simpler. I didn’t get any problem sets turned in anyway so nobody will lose points.

[2]You might already see that since ‘all English sentences’ is a pretty large set that it might be hard to try out every single English sentence – even if you are one of those linguists who has curiously wavered between the view that the # of sentences in a language are beyond countably infinite to the view that E is merely finite – but we’ll return to this in just a bit below to see how this is computed in practice.

[3]Because if for some English sentence pair, E E^*:

p(F1|E^′) x p(E^′)/ p(F1) > p(F1|E^*) x p(E)^*/ p(F1), then we can cancel p(F1) on both sides & get:

p(F1|E^′) x p(E^′)> p(F1|E^*) x p(E^*). (Recall that probabilities are always non-negative so this works.) Therefore we only have to compute the numerators.

[4]Which you can read in full here.

[5]For example, you can use Google Search to find all occurrences of the string “the boy ate the apple” (in quotes! About 167,000 or so when I looked) vs. “the apple ate the boy,” with only 43 results.) Now matter how you slice the apple, you’ll get the same yawning gap.

[6]Monty Python and the Holy Grail Corpus. Obviously. And a hell of a lot more fun than the Wall Street Journal sentences in the Penn Treebank. I swear, when I hear “Pierre Vinken…’” I reach for my gun. Well, depending on how deep your sense of irony runs, even the Wall Street Journal can be funny. Not as funny as Monty Python, though.

[7]All these examples have been taken from the wonderful thesis by Ali Mohammed at MIT, available here: http://dspace.mit.edu/handle/1721.1/75653. See Ali’s Table 3.2, p. 68. Some of Ali’s examples no longer ‘work as advertised’ because (it appears) since they have become widely known (Ali and many of his friends intern at Google, after all), special-purpose work-arounds (AKA ‘hacks’) have been installed. George Bush n’est pas un idiot used to come out, George Bush is an idiot. By now, you should be able to figure out why this might happen.

29 comments:

doonyakkaSeptember 2, 2013 at 4:03 PM
This comment has been removed by the author.
ReplyDelete
Replies
ewanSeptember 3, 2013 at 5:48 AM
This sin cannot be laid anywhere near the feet of Reverend Bayes - he did not invent the n-gram language model! However, he can help. The intuition that "probability of an English sentence" must at its core be something to do with the frequency of that sentence in a corpus licenses an immediate mental shortcut whereby, when asked for Pr(E), we just go off and collect co-occurrence statistics.

But this turns on what is (as far as RB is reputedly concerned) on a misunderstanding of what it means to be a "probability." For what we mean by "probability of an English sentence" in this context is really nothing to do with frequency, but rather just some "belief" score assigned by a unit measure on English sentences. This makes it clear that correct interpretation of Pr(E) is in this context, of course, a grammar! If the model is (correctly) decomposed into "grammaticality score" chained with "plausibility score" or some such, then we can easily mitigate or dispense with the role of the plausibility of what's said in the MT problem.

The right model of grammar will, presumably, not be able to support strange inferential moves such as increasing the grammaticality of "NP1 V1 NP2" while decreasing the probability of "NP2 V1 NP1" except perhaps at some substantial cost. Some reasonable model of world knowledge, on the other hand, ought to be able to do just that easily for such strange events as apples eating boys. Both the bad model and the good model can be trained on corpus data - again, thanks to RB: hierarchical Bayes makes it simple to cook up a coherent way to infer the hidden parts of the "realistic" model, at least in principle.
ReplyDelete
Replies
UnknownSeptember 3, 2013 at 9:15 AM
I have a few comments to Robert's post:

I was surprised that google would change tense from present to past in the German example and gave it a try. i discovered several things. Tense was never changed. If one takes the sentence you provided

Leute stehlen mein Weißes Auto. - One gets: White people steal my car.

But Germans would not capitalize 'white' and if one uses the correct:

Leute stehlen mein weißes Auto - google spits out: People steal my white car.

So it would seem word order has little to do with the 'funny' result but the incorrect capitalization is to blame. This brings me nicely to the problem I have with using google translate as tool in the ongoing Baysean bashing. You use the example of a handful sentences that are messed up [and not even in all cases for the reason you claim] but ignore the gazillions that are translated properly. So maybe the question to ask is not why these few examples are messed up but why so many come out right given the unsophisticated bi- tri- or even quadruple-grams you say are used by google? How can we explain that machine translation works as often as it does? It seems your examples are heavily skewed by your prior: anything but Chomskyan must be wrong. BTW, i gladly grant that google translate is a lousy model for human language use. But that is really a minor point; I doubt anyone thinks machine translation is terribly relevant to human language use or, more relevantly, language acquisition. So showing that machine translation fails at times [who knew!] is not really showing non-Chomskyan models of language acquisition are wrong [see below]

Now, as anyone who did the 'homework' reading the BBS paper linked to by Norbert [thanks for the assignment] must have learned from the Chater et al. commentary: fundamentalist Baysians who think stats is all that matters do not exist. Seemingly that needs to be said explicitly because you say:

"That’s an interesting fact to ponder, because you may recall that on the minimalist account, nothing about linear order matters (only hierarchical structure, what Noam called in Geneva the ‘Basic Property,’ matters) while on the trigram account, everything about (local) linear order matters, and word triplets don’t give a hoot about the ‘Basic Property.’"

The second part is an exaggeration. But I am really more interested in the first. What IS the model of the minimalist account. You say: "if you’re after some notion of “E-language” (as in, “extensional” or “external”), you simply can’t get any more “E” than this kind of language model: because what you hear and read (or scan) is E-xactly what you get"

Okay, lets ignore all this nonsensical E-language that, according to you, gets us nowhere, and talk models that do get us places. What model of the brain [I-language] do you currently have? You say: on the minimalist account, NOTHING about linear order matters (only hierarchical structure [does]). Okay, lets take that seriously: how do we learn about hierarchical structure if E-language does not matter - are we inferring hierarchical structure from I-language, that is brain structure [I seem to remember Chomsky said we don't]

Let me end by stressing: I do not want to defend an account here that opposes your view. Rather I would like to encourage you to tell us more about how YOUR account actually works vs. just beating up armies of straw men. That way we can all learn something worthwhile...

ReplyDelete
Replies
ChrisSeptember 3, 2013 at 9:20 PM
This comment has been removed by the author.
ReplyDelete
Replies
ChrisSeptember 3, 2013 at 9:31 PM
Google translate has many interesting lessons, but I don't think this example sheds much light on the limitations of Bayesian reasoning or with n-gram probabilities for language models. First, whether you have a frequentist or Bayesian view of probability (and, using any of the standard ways of axiomatizing probability), Bayes rule follows as a theorem.

So, if Bayes is okay, what is the problem with Google translate? Let's take an example that's on more neutral territory (that has nothing to do with language). Imagine comparing a naive Bayes classifier and a logistic regression classifier that use the same features to predict some property about widgets. If NB fails to predict as well as LR, we don't conclude that Bayes was wrong, we conclude that conclude that being naive was wrong. As with naive Bayes, the problem with Google translate is that the models of E and F|E are bad. More specifically, I think it's probably safe to assume that Google's model of E is *better* than its model of F|E. Since, although E has poor correspondence to reality in terms of process, it has been trained on a lot of data and probably is a good model of it (epicycles were bad models of I-planetary motion, but reasonably good models of E-planetary motion; and, all we care about in translation is whether we have a good model of observations).

To demonstrate what a rather standard Google-style n-gram model thinks of some of the sentences we might consider in this translation problem, I asked a 4-gram LM that was trained on about 10 billion words of English what it thought of all the permutations of "the apple ate the boy .". Here's what it had to say (the number on the right is the log_10 probability of the sentence):

the apple ate the boy . | -16.431
the boy ate the apple . | -16.4482
the the boy ate apple . | -17.9732
the apple the boy ate . | -18.2157
the boy the apple ate . | -18.2807

While this model clearly has different intuitions than a human speaker, it also manages to get the grammatical order above the rest. (It actually prefers the "bizarre" order since the news data I trained on didn't have boys or apples doing any eating; in any case, this isn’t a mere accident, this has to do with the interaction of a bunch of factors, assumptions made by the parameter estimator that know what contexts are likely to contain open-class words or closed class words, etc.). Anyway, my point here is just to argue that even with relatively small amounts of data that even I can get my hands on, n-gram models don't make terrible generalizations. Sure, we should be able to do better, but this probably isn’t the worst of Google’s sins.

However, as your example clearly shows, Google is clearly doing badly, so I think we should probably blame the translation model (i.e., the likelihood or F|E). It has either preferred to invert the subject and object positions (which defies our knowledge of French/English syntactic differences) or it has failed to suitably distinguish between those two orders although it should have.

In summary, I think this should be interpreted as a lesson in the importance of good models, or, if you can't have a good model, a good estimator (clearly, I'm not out of the job yet). But, I don't think we should blame the good reverend any more than when do when naive Bayes fails to perform as well than a less naive estimator.
ReplyDelete
Replies
Alex ClarkSeptember 4, 2013 at 2:33 AM
It's the translation model that is the problem; and I think (I am not an SMT expert so take this with a pinch of salt) that this is because English French is so easy from a word order point of view that there is no gain in practice from deploying a syntactic translation model as opposed to a low level IBM model.

Try this with an SMT system that uses a hierarchically structured translation model as people use for say English-Urdu.
There are translation systems that use hierarchical structure --
(see Koehn's slides here: http://homepages.inf.ed.ac.uk/pkoehn/publications/esslli-slides-day5.pdf)
but they also use Bayes rule of course.
ReplyDelete
Replies
Robert BerwickSeptember 4, 2013 at 3:23 PM
Of course, the Reverend's 'sin' was hyperbolic - a lure. The point was not the math, but its application: what happens when one turns that soothingly clear Bayesian crank, chopping a problem into apparently 2 neat parts & then evaluates the results according to the commonly-used litmus test, BLEU, all without thinking about what sausage pops out at the other end. Since BLEU is based on ordered n-grams, then you wind up rewarding ordered n-gram models. Ali shows that doubling the language model training set will always boost the BLEU score. So you might get seduced into thinking that this is always the way to go. Yes, Google Translate might not be the best engine to illustrate these issues because we really don't know what's inside, but on the other hand GT is something anyone can try out for themselves without me having to explain how to install some complicated publicly available software like Moses -- the package that Ali actually used in his thesis.

I agree with Chris: since GT has lots of data its P(E) model is probably darn good. So it (properly) assigns a high score to "the boy ate…" and a very low score to "The apple ate…" But that means (in agreement with Chris and Alex) that our translation model P(F|E) will have to even better to overcome this effect. The fact that capitalization can warp translations actually supports this view: "Leute stehlen mein weiß Auto" --> "White people steal my car"; but "Leute stehlen mein weiß auto" --> "People steal my car white". Two years ago, the French "George Bush n'est pas un idiot" came out as: "George Bush is an idiot" but that feedback button in GT, used enough, seems to have worked, so this no longer happens. (Same deal with Italian.) However, if you take a less-picked-over language, say, Polish, then "George Bush nie jest idiotą" does still pop out as "George Bush is an idiot." Perhaps people will poke the Polish feedback button and it too will fix this example. Note that grammars (in the linguists' sense) don't really show up in any of this.
ReplyDelete
Replies
Robert BerwickSeptember 4, 2013 at 3:31 PM
Packing grammatical information into P(E) as Ewen suggests sounds challenging. If I understand this correctly, it would entail constructing a 'generative' story in the probabilistic sense -- a sequence starting with 'the right model of the grammar' for E and winds out to a production model, along with other conditioning effects (a 'reasonable model of world knowledge'). Ewen says 'hierarchical Bayes makes it simple….at least in principle.’ I worry about that 'in principle' tag because as noted in Bayes Daze I, every hierarchical Bayesian model (HBM) I've had the chance to examine in detail, e.g., models that learn which English verbs alternate or not (like 'give') are readily shown to be equaled by much simpler methods. The challenge in drawing a straight line between ‘grammatical’ and ‘likely’ was discussed a bit in an earlier blog, “Revolutionary new ideas…”, here. Further, we don't have any good story to tell about what 'compiler' the brain might have that turns acquired 'knowledge of language' into the knowledge we 'put to use' (or even if this makes sense). If someone presents a concrete proposal using HBMs to find, e.g., 'latent variable structure' in this way, that would be of great interest, but it still seems a long road from a grammar to actual language use.

Rather Ali notes, "The idea that language is a simple Markov chain is frustrating; the late Fred Jelinek (who initiated a lot of the statistical MT work) describes this model as "almost moronic... [capturing] local tactic constraints by sheer force of numbers, but the more well-protected bastions of semantic, pragmatic and discourse constraint and even morphological and global syntactic constraint remain unscathed, in fact unnoticed". Ali's move is not to improve the language model or translation model -- but to ditch BLEU. He revives an old tradition in psychometrics, forced choice metrics (as used by Thurstone starting in the 1920s), but in a novel setting with Mechanical Turk and a bit of novel math, to arrive at a human-assisted scoring method that is still fast, but less sensitive to the language model effect and always training for better and better n-gram performance and BLEU scores (but worse translation when carefully judged by people).
ReplyDelete
Replies
Alex ClarkSeptember 5, 2013 at 2:47 AM
Bob, this discussion reminds me of your critique of treebank parsing that you did with Sandiway Fong. So you point out some problems with modern statistical NLP techniques which tend to use shallow and linguistically ill-informed techniques.

But what is the take home point? Is the argument that we should use models based on sound principles of Chomskyan syntax? Or we should use more linguistically well informed models in general albeit non Chomskyan? Or that we should stop using Bayes rule? Or should we stop using probabilities completely?

Because there are already many people trying to build better NLP systems by adding in linguistic know-how. But it is very hard and has limited success so far.

ReplyDelete
Replies
VilemKodytekSeptember 5, 2013 at 3:38 AM
Re: Google translate. If I put in the Czech sentence
Lidi kradou moje bily auto
which translates into English word by word
People steel my white care
However, the Google translation reads:
White people steal my car.
ReplyDelete
Replies
Robert BerwickSeptember 8, 2013 at 11:07 AM
Perhaps this is a good way to round things off, a few more examples from GT:
'mange garcon le' ==> the boy eats

'mange pomme garcon le le' ==> boy eats the apple

Chacun à son goût, as they say.
Did folks read the NYtimes story about the poker machine trained via neural networks that seemingly can beat any human at Texas Hold'Em? Here's an excerpt:

"The machines, called Texas Hold ‘Em Heads Up Poker, play the limit version of the popular game so well that they can be counted on to beat poker-playing customers of most any skill level. Gamblers might win a given hand out of sheer luck, but over an extended period, as the impact of luck evens out, they must overcome carefully trained neural nets that self-learned to play aggressively and unpredictably with the expertise of a skilled professional. Later this month, a new souped-up version of the game, endorsed by Phil Hellmuth, who has won more World Series of Poker tournaments than anyone, will have its debut at the Global Gaming Expo in Las Vegas."

Back to the choices, then: which would you rather know?
(a) The machine's inference/learning method, namely, reinforcement learning (and perhaps the final output weights on the many, many artificial neurons, viz., that unit A has weight 0.03447, B has weight 1.4539 etc) or

(b) Nothing whatsoever about how it got to this final state, but an explanation of how and why the machine works so well *of the usual scientific sort*, viz., with counterfactual conditionals and such stuff that philosophers of science endorse.
My choice is (b).

Finally (!) as a side comment re the Ptolemaic/Copernican accounts of planetary motion: this is a great example, since it's straightforward to show that the Ptolemaic (epicycle) model (and 'model' is the right word) can mimic *any* kind of motion - planetary or not – to any desired degree of accuracy. How so? Well, epicycles are arcs of circles of varying radii. If you can add up any number of circles of varying radii - recall the eqn of a circle is a sine/cosine fn. Different epicycles= any # of sine/cosine fns, added up. But we already know what that is: it is a Fourier series. So *any* fn can be approximated this way. No wonder the Ptolemaic model was safe from empirical refutation - and indeed it performed (still!) far better than the alternative Copernican system. The punchline is that the Copernican system is *explanatory* - it says what *cannot* be a planetary motion. (There's a lot more to say about this fascinating history of science example, but I shall stop here.)
ReplyDelete
Replies
Avery AndrewsSeptember 9, 2013 at 11:39 PM
"I think that where we might diverge is what is the hard problem here. To my mind, it is finding out what UG looks like" To my mind, this might be the (relatively) easier problem, but, at any rate, the Reverend helps with it by upgrading the Evaluation Metric to something that the neighbors understand and is more acceptable to students.
ReplyDelete
Replies
HKSeptember 16, 2013 at 5:41 PM
La pomme*
ReplyDelete
Replies

Add comment

Faculty of Language

Comments

Monday, September 2, 2013

Bayes Daze Translation Bleus

29 comments:

Contributors