Just over three years ago, in March 2016, artificial intelligence leaped across a once-formidable hurdle: AlphaGo—DeepMind’s advanced AI— beat world champion Lee Sedol at Go, a game many thought that no AI could learn.
It was a stunning victory that launched the steady march of AI progress in replacing us. Or did it? Stay for the second act:
When Facebook attempted to replicate AlphaGo, the system developed by Alphabet’s DeepMind to master the ancient game of Go, the researchers appeared exhausted by the task. The vast computational requirements—millions of experiments running on thousands of devices over days—combined with unavailable code, made the system ‘very difficult, if not impossible, to reproduce, study, improve upon, and extend,’ they wrote in a paper published in May.Gregory Barber, “Artificial Intelligence Confronts a Reproducibility Crisis” at Wired
The Facebook team did eventually succeed in replicating AlphaGo’s success. But their difficulty exposes a problem with Deep Learning-based AI: reproducibility.
In science, only reproducible results are considered to add to knowledge. If no one can reproduce your results, did you discover something new or did you just get lucky?
For example, when researcher Joelle Pineau — a McGill University professor of computer science—asked her students to replicate the results from another lab, they succeeded only after they had “tried some ‘creative manipulations’ that didn’t appear in the other lab’s paper.”
The random core—from the order in which training data is provided to the unpredictable stochastic elements within the algorithms of Deep Learning-based AI—almost guarantees attempts to exactly reproduce results will fail. Further, the volume of data and possible “hand-tuning” work against replication.
While no one doubts that the researchers are achieving what their papers claim, when others cannot reproduce those results, it is no longer possible to either verify or even improve those systems, given the high level of chance results.
Failure to replicate results stymies AI research. Sure, we can measure how successfully a system might, say, identify a photo of a cat from a known image set. But how well will it handle the next image? Without verification and replication, no one can say.
As unverified—and unverifiable—Deep Learning-based AI moves out of labs and into our lives, it possibly endangers, or at least misleads, us. Were the results of a medical test really negative? Do our customers really want us to sell that product? Is this series of credit card charges not fraudulent? Will the autonomous car safely stop in time?
Researchers are trying to address the reproducibility problem. For example, Pineau has published a Reproducibility Checklist. Others are available, including one from the Paul Allen Institute for Artificial Intelligence (AI2), whose recent work with Aristo passed a major portion of an 8th-grade science test.
But while Deep Learning-based AI has achieved more gains more quickly than prior approaches, until researchers—and the watchdogs meant to protect the public, such as the NHTSA—can verify and replicate those results, we should be wary of ceding decisions to unmonitored systems, no matter how cool they might seem or astonishing the supposed results.
After all, we test and license all other professionals, including, for example, hairstylists. Should AI professionals not be accountable too?
Also by Brendan Dixon:
Why AI appears to create things
Why AI fails to actually create things