Sunday, June 30, 2013

A suggested entry for "Big Data" in the philosopher's lexicon

For those that have never perused the Philosopher's Lexicon, you are in for a treat (here). I have just come across the following proposed definition for "Big Data," that I found as revealing as it is amusing (here):

Big Data, n: the belief that any sufficiently large pile of shit contains a pony with probability approaching 1.

There is no substitute for thinking, not even very large amounts of data.

The idea that Big Data can be theory free is not a bug, but a feature (here). If it catches on it might really change what we consider to be the point of science. There has always been a tight relation between the cognitive pursuit (why) and the technological one (can I control it). Good technology often builds on theoretical insight. But sometimes not. Big Data seems willing to embrace the idea that insight is overrated. This is why people like Chomsky and Brenner a.o. are hostile to this move towards Big Data: it changes the goal of science from understanding to control, thus severing the useful distinction between science and engineering.


  1. amusing "lexicon" indeed:

    chomsky, adj. Said of a theory that draws extravagant metaphysical implications from scientifically established facts. "Essentially, Hume's criticism of the Argument from Design is that it leads in all its forms to blatantly chomsky conclusions." "The conclusions drawn from Heisenberg's Uncertainty Principle are not only on average chomskier than those drawn from Godel's theorem; most of them are downright merleau-ponty."

    1. This comment has been removed by the author.

  2. Addendum:

    My amusing quote was posted before Norbert's addendum. Since that is not obvious I now look as if I make fun of the latter - I don't.

    I am concerned about 'big data' as well but think Norbert misconstrues the reason for WHY we should be concerned. In all of the examples Norbert has given the problem is not that big-data fetishists have NO theory. They do [like "[All/most] Terrorists are muslims" in the linked piece] and they use big data to find confirmation. It is of course correct that if one looks at a large enough mass of data one can find confirmation for virtually ANY theory. But that is no new insight: MarkTwain either coined or popularized "There are lies, damn lies, and statistics" back in the 19th century. So we already know [or should know] that.

    I think it is important to deal responsibly with the big data issue because, obviously, new technology creates opportunities that need to be handled with care. But the LAST thing we need is the irresponsible dismissal of ANY data that may call in question one's theory, that Chomsky has been promoting for a decade now [I discuss the most pernicious examples in SoL here: ]

    We teach in intro phil-o-science that from two theories T1 and T2 one should prefer T1 if it accounts better for the phenomena under consideration, is simpler, and not in conflict with widely accepted facts. Of course in the world of real science things are messier. Data can be misinterpreted, contaminated, etc. etc. - scientists KNOW that. So it is obvious that one should not throw out a theory [especially one that is well formulated and has great explanatory power], the first time one finds an apparent counter example. One may be justified to set the counter example aside. BUT one cannot do that for EVERY counter example until one reaches the 'argument from the Norman Conquest and abstracts "away from the whole mass of data that interests the linguist who wants to work on a particular language" [Chomsky, 2012, p. 84].

    These repeated attacks on 'big data' [which no one here seems to defend] combined with the refusal to admit how irresponsible Chomsky's publications have become, create the unfortunate impression that Minimalism has reached the status of Marxism [which was no theory either but a world view, so please spare us the stale 'minimalism a program not a theory' rebuttal]: nothing possibly can falsify it.

    In order to do science one needs to be willing to accept that data can [at least in principle] falsify one's theory. I asked previously: what is the right amount of data? [put another way: How many counter examples are too many?] This question has not been answered by Norbert [or by anyone commenting on this blog]. So I ask again: do minimalists have an answer?