In this week’s podcast, Gary Smith, author of The AI Delusion (Oxford University Press, 2018) talks with Robert J. Marks about how the pressure to publish or perish corrupts research—with a special focus on the Texas Sharpshooter Fallacies:
Marks: Your “Texas Sharpshooters” aren’t very accurate, are they? They’re shooting against the wall and they have no aim at all.
Smith: It’s not picking on Texas! It has to do with something called “data mining.” It’s the practice of getting a bunch of data, looking for correlations, you don’t find it, you don’t find it, you try something else, you don’t find it, you try something else… and of course, sooner or later, you find something.
Marks: So that’s related to Texas Sharpshooter Fallacy # 1. Explain…
Smith: Texas Sharpshooter Fallacy # 1 is that I’m going to prove what a great shot I am and so I stand outside a barn and then I go and paint a thousand targets on the barn and I fire my gun and what do you know, I am lucky, I hit a target. And then I go and erase all the other targets and I say, look, I hit the target. And, of course, it’s meaningless, because, with so many targets, I’m bound to hit something… So the Texas Sharpshooter Fallacy #1 is reporting lots and lots of different theories and reporting the one that seems confirmed by the data and not telling anybody that you tested thousands of other theories.
Marks: And you made a relationship with this, of drinking coffee and pancreatic cancer, and how this made the New England Journal of Medicine?
Smith: I think it was tobacco they were looking for and they looked at tobacco, I think it as cigarettes and cigars, they just kept looking for so many different things and they finally found a correlation between coffee drinking and pancreatic cancer.
First of all, they tested so many things that it wasn’t believable. And it also turned out that their data were skewed and the people they looked at were all going to the same doctors who were concerned with upset stomachs and things like that. And the people who had ulcers had stopped drinking coffee and the people who had pancreatic cancer had not stopped drinking coffee and so they found a correlation between the two, between drinking coffee and pancreatic cancer. And it wasn’t that coffee caused the cancer, it was that having the cancer allowed them to keep drinking coffee, unlike the ulcer sufferers who had given up coffee… The current medical research is actually that coffee is good for you, that coffee reduces the chances of cancer.
Research on organic food, GMOs, butter, or wine, as bad for your health—some of it published in prestigious journals—typically suffers from the same basic problem, Smith explained.
Texas Sharpshooter Fallacy #2 is, he says, “You take the barn again and you don’t paint anything on it, you just fire your gun blindly and the bullet hits the barn—hopefully—and you then go and draw a target around the bullet hole and pretend that’s what you were aiming for. That’s like looking at the data and finding some coincidental little thing in there and pretending that’s what you were looking for in the first place.” He discusses the claims of discredited researcher Diederik Stapel that his research showed that “messy rooms make you racist.” Hear the rest at 6:13 here.
The enormous “publish or perish” pressure to produce such fallacy-ridden research, acknowledged by guest and host, makes clear why more and better AI is not the answer.
Crunching ever larger sets of data in ever more intricate ways would not change the basic problem: Correlations can be made to appear meaningful when they are not. And conclusions that generate must-read headlines are more readily accepted than uncertainty.
One might as well hope for a change in human nature.
See also: AI Delusions: A statistics expert sets us straight We learn why Watson’s programmers did not want certain Jeopardy questions asked (Robert J. Marks and Gary Smith)
Explore more of the paradoxes of Big Data: Big Data can lie: Simpson’s Paradox The Paradox illustrates the importance of human interpretation of the results of data mining
Study shows eating raisins causes plantar warts Sure. Because, if you torture a Big Data enough, it will confess to anything