Mind Matters News‘ contributor Gary Smith‘s book, The 9 Pitfalls of Data Science (Oxford University Press, 2019), co-authored with Jay Cordes, has won the Publishers’ Weekly 2020 Prose Award for “Popular Science & Popular Mathematics.”
9 Pitfalls focuses on a key reason we need critical thinking skills about data in the age of Big Data: “Machines often find meaningless patterns that can lead to dangerous false conclusions.” Its fascinating tales of great successes and epic failures in understanding what the data really mean had already earned praise from a very significant source, as Oxford University Press notes,
“Gary Smith and Jay Cordes have a most captivating way and special talent to describe how easy it is to be fooled by the promises of spurious data and by the hype of data science.” – Professor John P.A. Ioannidis, Scientist, Stanford University, “the godfather of science reform” (Wired), “one of the most influential scientists alive” (Atlantic)
Smith is the Fletcher Jones Professor of Economics, Pomona College and a Walter Bradley Center Fellow. He is also the author of The AI Delusion and Money Machine. Jay Cordes is a software developer and a data analyst who has been, among other things, a sparring partner for a pokerbot.
Cordes lists the nine pitfalls that the book addresses at his site:
The 9 Pitfalls of Data Science
1. Using Bad Data
2. Putting Data Before Theory
3. Worshiping Math
4. Worshiping Computers
5. Torturing Data
6. Fooling Yourself
7. Confusing Correlation with Causation
8. Being Surprised by Regression Toward the Mean
9. Doing Harm
But which mistakes can really affect your career? That depends on what you do for a living. We asked Gary Smith some questions to tease out the risks. Suppose you are a decision-maker of some kind:
Mind Matters News: Of the nine pitfalls of interpreting data listed by your co-author, which is the most dangerous for public policy?
Gary Smith: Putting data before theory. Many data-mining algorithms that are now being used to screen job applicants, price car insurance, approve loan applications, and determine prison sentences have significant errors and biases that are not due to programmer mistakes and biases, but to a misplaced belief in data-mining.
This blind faith is truly frightening. Two Chinese researchers reported that they could predict with 89.5 percent accuracy whether a person is a criminal by applying their AI algorithm to scanned facial photos. A prominent data scientist wrote that “the study has been conducted with rigor. The results are what they are.” A blogger wrote in response, “What if they just placed the people that look like criminals into an internment camp? What harm would that do?”
Mind Matters News: Okay, Dr. Smith, but many of us are usually on the other side of the decision desk when it comes to getting hired, getting a car, or getting charged with an offense. Which is the easiest misunderstanding about data for an average person to fall into?
Gary N. Smith: The belief that correlation is causation. We see innumerable coincidental patterns in our lives and we have been hard-wired to think that patterns are meaningful. Some are obviously not—there is a 0.99 correlation between annual beer sales in the U.S. and marriages. So does drinking cause marriage or does marriage cause drinking?
Other patterns are deceptive. Google Flu identified several keyword search terms that were correlated with flu outbreaks. They reported that their model was 97.5 percent accurate. After issuing its report, Google Flu over-estimated the number of flu cases for 100 of the next 108 weeks, by an average of nearly 100 percent.
Mind Matters News: It sounds as though even serious nerds get fooled by that one! But now, which data interpretation error is the hardest for an average person—who is being honest—to even spot?
Gary Smith: The hardest one to spot is probably “regression toward the mean.” Here’s an example: If the British Open champion loses the next tournament, we might conclude that he wasn’t focused. If the student with the highest test score does not do as well on her next test, we might conclude that she did not study. If the patient with the worrisome medical result fares better a month later, we might conclude that whatever treatment was prescribed was effective.
In reality, outstanding results are often due in part to luck and thus no special explanation is needed as to why they are not repeated. If we recognize that luck may have played a role, we are more likely to realize that the tournament winner is not necessarily the best player, that the student with the highest test score is not necessarily the best student, and that the patient with a worrisome medical reading on one test does not necessarily have a disease.
Mind Matters News: But now, tell us about the error that is most likely to happen only to a serious nerd?
Gary Smith: Worshiping math. One reason for the mortgage meltdown that triggered the Great Recession was that Wall Street trusted complex mathematical models that the brokers didn’t understand. On the eve of meltdown, an American International Group (AIG) executive boasted that, “It is hard for us, without being flippant, to even see a scenario within any kind of realm of reason that would see us losing one dollar in any of those transactions.“ A year later AIG received a $180 billion federal bailout.
Smith’s irreverent attitude to computers is on display in a recent Oxford University Press Blog entry where he writes:
Despite their freakish skill at board games, computer algorithms do not possess anything resembling human wisdom, common sense, or critical thinking. Deciding whether to accept a job offer, sell a stock, or buy a house is very different from recognizing that moving a bishop three spaces will checkmate an opponent. That is why it is perilous to trust computer programs we don’t understand to make decisions for us.Gary Smith, “AI is dangerous, but not for the reasons you think.” at Oxford University Press Blog (December 18, 2019)
And here’s Smith (2014) on the odd fact that catchy stock ticker names actually make a difference, probably for psychological reasons:
He wrote about stock ticker names here at Mind Matters News in more detail: A BABY, A GEEK, AND A COW… all walk into a bar looking for some BEER and VINO…
But Smith also writes about the many theories of numbers that don’t add up. Here’s a selection of his articles at Mind Matters News:
Mind Matters: 7-Eleven Babies
Mind Matters: The Paradox of Luck and Skill
Mind Matters: Computers’ Stupidity Makes Them Dangerous
Mind Matters: A BABY, a GEEK, and a COW
Mind Matters: Love Math and Computers? Love can be Blind
Mind Matters: The World Series: What the Luck?
Mind Matters Bingecast: Cheese Consumption and Tangled Bedsheets
Mind Matters: Why Shuffle the Deck 7 Times?
Mind Matters: Ransacking Flawed Data for Hidden Treasures Seldom Ends Well
Mind Matters: Is Hot Hands Just a Basketball Myth?