In their view, it’s a dangerous mix: Humans naturally assume that all patterns are significant. But AI cannot grasp the meaning of any pattern, significant or not. Thus, from massive number crunches, we may “learn” (if that’s the right word) that
- Stock prices can be predicted from Google searches for the word debt.
- Stock prices can be predicted from the number of Twitter tweets that use “calm” words.
- An unborn baby’s sex can be predicted by the amount of breakfast cereal the mother eats.
- Bitcoin prices can be predicted from stock returns in the paperboard-containers-and-boxes industry.
- Interest rates can be predicted from Trump tweets containing the words billion and great.
These, Smith assures us, are actual findings from Big Data searches. He expands on the theme a bit here:
Mind Matters News: So you are saying that a human tendency to seek patterns coincides with an AI tendency to produce a variety of meaningless patterns like “An unborn baby’s sex can be predicted by the amount of breakfast cereal the mother eats.” And the combination results in bad data finding its way into science journals. Is that a fair characterization?
Gary N. Smith: I would put it this way. AI is currently based on finding patterns in numbers, pixels, sound waves, and other kinds of data. We humans are hard-wired to presume that the patterns we observe are meaningful—so, we are overly impressed by AI’s pattern-discovery prowess.
We do not fully appreciate the fact that even random data contain patterns. Thus the patterns that AI algorithms discover may well be meaningless. Our seduction by patterns underlies the publication of nonsense in good peer-reviewed journals.
The study you mentioned about moms and breakfast cereal was published in Proceedings of the Royal Society.
Mind Matters News: You distinguish between the traditional scientific method, which is theory-driven, and the data-mining approach of Big Data, which seizes on interesting patterns. The theory-driven approach can be falsified, in the sense that a hypothesis is either supported by a given study or it isn’t. Is the data-driven approach as vulnerable?
Gary N. Smith (pictured): Patterns discovered by data mining are, by definition, supported by the data. The problem is that, after the fact, researchers concoct semi-plausible theories that are consistent with the patterns they discover and then report statistical tests of these theories using the same data that yielded the theories. Such tests are meaningless.
Mind Matters News: What would you say are the single most damaging outcomes of data-driven research? Can you think of conclusions that might be drawn from nonsense data you’ve encountered that might have a genuinely bad effect on policy decisions in say, economics, law enforcement, or medicine.
Gary N. Smith: One especially damaging example is a study by two Harvard professors, Carmen Reinhart and Ken Rogoff, that concluded that nations are likely to experience recessions when the ratio of federal government debt to the nation’s annual output goes above a 90 percent tipping point. This flawed conclusion persuaded many governments to try to reduce their budget deficits by cutting spending and raising taxes—which caused the deep recessions they were trying to avoid. Turning military operations over to AI algorithms is potentially even more dangerous.
Mind Matters News: What might be done to address the problem of data-driven findings in science that can’t be repeated? What would be the first steps?
Gary N. Smith: Some journals now require authors to submit pre-research plans describing how the data will be analyzed and post-research reports detailing any departures from the initial plan.
Mind Matters News: What would you see as obstacles to such a reform?
Gary N. Smith: The unscrupulous could submit their pre-research plans after they have already used data mining to discover patterns. In addition, many researchers believe that their goal is to discover statistically significant patterns and that there shouldn’t be any prior constraints on their data mining.
Mind Matters News: Is part of the problem that we are in some sense “addicted” to Big Data? Are we perhaps unwilling to give up the “magic black box” in favour of the perennial struggle for “wisdom, commonsense, and expertise,” as you put it?
Gary N. Smith: I wouldn’t say that we are addicted to Big Data but that we are addicted to patterns and Big Data has multiplied beyond belief the number of meaningless patterns waiting to be discovered—which means that the probability that a data-mined pattern is meaningful is very close to zero.
In the end, the book is an argument for the enduring value of “human wisdom and experience,” as Smith puts it. A brief summary is available here.
You may also enjoy:
New book takes aim at phantom patterns “detected” by algorithms. Human common sense is needed now more than ever, says economics professor Gary Smith.
In science, we can’t just “settle” for data clusters. The board game, Settlers of Catan, offers a clear illustration of what can go wrong when we are duped by data clusters. (Gary Smith and Jay Cordes)