I have a childhood friend, Bill, who is a baseball savant. Who was the last Major League Baseball (MLB) player to bat 0.400 in the regular season? Ted Williams, 0.406 in 1941. Which pitcher has the most career wins? Cy Young, 511. Which team has won the World Series the most times? The New York Yankees, 27. Those are easy answers, familiar to most baseball fanatics.
But Bill also knows the hard ones: How many stitches are there on every major league baseball? 108. Who hit the most triples in 1954? Minnie Minoso, 18. Who led the majors in stolen bases in 1906? Frank Chance, 57.
Bill has “memory without reckoning.” Like the famous young savant who could recite every word of the six-volume History of the Decline and Fall of the Roman Empire without any comprehension of the content, Bill has no real understanding of the game of baseball. He doesn’t know what triples are or why they are rare. He doesn’t know why players try to steal bases. He recites without thinking.
Bill’s prodigious memory for baseball trivia is astonishing but useless and obsolete now that computers can swiftly access every baseball fact that he knows. Yet, computers also have memory without reckoning; they literally do not know what any of these facts mean.
It gets worse. Not only do computers not know why triples are rare or why players try to steal bases, they do not know in any meaningful sense what the words triple, bases, or baseball mean. They cannot even identify a picture of a baseball that has been altered by a few pixel changes.
Nor can computers cannot answer simple questions like these with certainty:
- Is it is easier to field a ground ball if I close my eyes?
- Should a baserunner hop when he wants to steal a base?
- Is it time to relieve a pitcher when he chews licorice?
It is not just baseball. Computers are savants in many fields.Consider bitcoin, the most well-known cryptocurrency.
Computers can recite definitions of bitcoin without understanding any of the words in the definitions. They can put the word “bitcoin” next to bitcoin images they have trained on without knowing that bitcoin is not a physical object. Computers can retrieve historical bitcoin prices without having any idea why bitcoin prices fluctuate or how bitcoin price fluctuations might affect other things.
One data-mining algorithm found a positive, statistically significant correlation between bitcoin prices and stock prices in three industries: consumer goods, healthcare, and fabricated products. Are these meaningful relationships or meaningless coincidences? The algorithm doesn’t know because it doesn’t know what any of those words mean, let alone why they might be related or unrelated. All of the data are just numbers with labels that the algorithm doesn’t understand.
The algorithm did its task quickly and efficiently, data mining hundreds of variables for correlations without having the slightest idea what to make of the correlations it found. It behaves as if it were an autistic savant.
Another data-mining analysis found statistically significant relationships between bitcoin prices and stock prices in these three industries: paperboard containers and boxes; soaps, cleaners and toilet goods; and cutlery, hand tools and hardware. Were these statistical correlations positive or negative? Two were negative and one was positive, but it hardly matters since all three were no doubt coincidental—though a computer algorithm would have no way of knowing that. A black-box stock picking algorithm might have bought or sold stock in these three industries based on bitcoin prices, and would have been the poorer for that.
Still another algorithm found that bitcoin prices could be predicted from the number of Google searches for the word Jumanji, which refers to the movie Jumanji: Welcome to the Jungle. An increased number of Jumanji searches typically preceded an increase in bitcoin prices. A decline in Jumanji searches predicts a decline in prices. A black-box investing algorithm might have bought or sold bitcoin based on Jumanji searches, and would have been the poorer for it.
Figure 1 shows that for January 2011 through May 2018, the time period initially considered by the data-mining algorithm, the correlation between monthly Jumanji searches and the market price of bitcoin on the first day of the next month was a remarkable 0.73.
If the discovered correlation had continued, the correlation would have been a novel way to get rich by using digital search data to predict the price of a digital currency: In months when Jumanji searches are high, buy on the last day of the month; in months when Jumanji searches are low, sell on the last day of the month.
Figure 1 Predicting Bitcoin Prices from Jumanji Searches
Here’s the hitch: The data-mining algorithm—as is true of all computer algorithms—had no way of determining whether the relationship it uncovered was useful or useless. The movie Jumanji: Welcome to the Jungle happened to have been released in December 2017, near the peak of the bitcoin bubble. Google searches increased prior to the movie’s release and then subsided afterward, which coincidentally matched the rise and fall of bitcoin prices. That, in turn, suggested that Jumanji search data could enable profitable predictions of bitcoin prices. Humans would know better.
But how would the algorithm have fared using that relationship to predict bitcoin prices going forward? Figure 2 shows that it would have completely whiffed on the rebound in bitcoin prices in 2019. There was a close correlation during the in-sample period used by the data-mining algorithm to discover the correlation, and no correlation at all during the out-of-sample period when the algorithm flopped trying to predict bitcoin prices.
Figure 2 Jumanji and Bitcoin
My friend Bill is great at reciting baseball trivia—which is a very narrow skill set. Computers are also great at very narrowly defined tasks, like playing Go, retrieving facts, and calculating correlations. But they are utterly unreliable for anything requiring true understanding, wisdom, or common sense.
I ask Bill baseball trivia every time I see him but I wouldn’t trust him to manage a baseball team. I use computers almost every day to retrieve obscure facts and make complicated calculations but I don’t trust them to make decisions for me.
If you enjoyed this column by Gary Smith, here are some of his other recent reflections on the fascinating world of computers and the stock market—and the World Series thrown in:
Investors, AI isn’t your big fix In investing and elsewhere, an AI label is often more effective for marketing than for performance
A BABY, A GEEK, and a COW all walk into a bar looking for some BEER and VINO… …
and yes, the World Series, written just before the most recent game:
What the Luck? Who will win the World Series? I don’t know, but I do know that baseball is the quintessential game of luck.