Gary Smith
Retracted Paper Is a Compelling Case for Reform
The credibility of science is being undermined by misuse of the tools created by scientists. Here's an example from an economics paper I was asked to comment onWhy Chatbots (LLMs) Flunk Routine Grade 9 Math Tests
Lack of true understanding is the Achilles heel of Large Language Models (LLMs). Have a look at the excruciating resultsInternet Pollution — If You Tell a Lie Long Enough…
Large Language Models (chatbots) can generate falsehoods faster than humans can correct them. For example, they might say that the Soviets sent bears into space...ChatGPT, Bing, Bard, and other large language models (LLMs) are undeniably astonishing. Initially intended to be a new-and-improved autocomplete tool, they can generate persuasive answers to queries, engage in human-like conversations, and write grammatically correct essays. So far, however, their main successes have been in providing entertainment for LLM addicts, raising money for fake-it-till-you-make-it schemes, and generating disinformation efficiently. Earlier this year Jeffrey Funk and I predicted a potentially debilitating feedback loop for LLMs. As the internet they train on becomes increasing polluted with LLM hallucinations and disinformation, LLMs may become increasingly prone to generating hallucinations and disinformation. I recently saw a concrete, personal example of this. One embarrassment for the early versions of OpenAI’s ChatGPT was that it kept Read More ›
Computers Still Do Not “Understand”
Don't be seduced into attributing human traits to computers.The subtitle of a recent New Yorker article was: “Geoffrey Hinton has spent a lifetime teaching computers to learn. Now he worries that artificial brains are better than ours.” I respectfully disagree. As I’ve repeatedly argued, the real danger today is not that computers are smarter than us but that we think computers are smarter than us. Hinton is extremely intelligent, but he is not the first, and will not be the last, extremely intelligent person to be seduced by a full-blown Eliza effect, attributing human traits to computers. Consider Hinton’s argument about large language models (LLMs): People say, It’s just glorified autocomplete . . . Now, let’s analyze that. Suppose you want to be really good at predicting the Read More ›
When it Comes to New Technologies Like AI, Tempers Run Hot
So far, the most tangible LLM successes have been in generating political disinformation and phishing scams.Ask an AI-enthusiast how big AI is and how big AI will become, and the answer is likely to be that it is already enormous and that we haven’t seen anything yet. Our enthusiasm is more nuanced. We gave Microsoft’s Bing with ChatGPT-4 the prompt, “How big is AI?,” and received some very specific numbers, along with helpful references: I assume you are asking about the size of the Artificial Intelligence (AI) industry. According to a report by Grand View Research, the global AI market size was valued at USD 136.55 billion in 2022 and is projected to expand at a compound annual growth rate (CAGR) of 37.3% from 2023 to 2030. Another report by Precedence Research estimates that the Read More ›
Let’s Dispose of Exploding Pie Charts
Pie charts are seldom a good idea. Here's why.A picture can be worth a thousand words. A graph can be worth a thousand numbers. Unfortunately, pictures and words can be deceptive — either intentionally or unintentionally. One common type of deception involves the use of two- or three-dimensional figures to represent single numbers. For example, the Washington Post once used the figure below to illustrate how inflation had eroded the purchasing power of the U.S. dollar between 1958 and 1978. To make the figure memorable, each of the dollar bills contained a photo of the U.S. President that year in place of the George Washington image that actually appears on the $1 bill. Washington Post, October 25, 1978 Prices more than doubled between 1958 (when Dwight Eisenhower was Read More ›
Large Language Models are Still Smoke and Mirrors
Incapable of understanding, LLMs are good at giving bloated answers.I recently received an email invitation from Google to try Gemini Pro in Bard. There was an accompanying video demonstration of Bard’s powers, which I didn’t bother watching because of reports that a Gemini promotional video released a few days earlier had been faked. After TED organizer Chris Anderson watched the video, he tweeted, “I can’t stop thinking about the implications of this demo. Surely it’s not crazy to think that sometime next year, a fledgling Gemini 2.0 could attend a board meeting, read the briefing docs, look at the slides, listen to every one’s words, and make intelligent contributions to the issues debated? Now tell me. Wouldn’t that count as AGI?” Legendary software engineer Grady Booch replied, “That demo Read More ›
Computers May Know “How” but They Still Don’t Know “Why”
Computers will not equal, let alone surpass, human intelligence.LLMs Are Still Faux Intelligence
Large language models are remarkable but it's a huge mistake to think they're "intelligence" in any meaningful sense of the word.A Modest Proposal for the MLB
Major League Baseball got greedy and needs to reform.The MLB Coin-Flipping Contest
What are the chances that wild-card teams will make it to the World Series and win?Blue Zone BS: The Longevity Cluster Myth
We need to be reminded how much real science has done for us and how real science is done.Confusing Correlation with Causation
Computers are amazing. But they can't distinguish between correlation and causation.Artificial intelligence (AI) algorithms are terrific at discovering statistical correlations but terrible at distinguishing between correlation and causation. A computer algorithm might find a correlation between how often a person has been in an automobile accident and the words they post on Facebook, being a good software engineer and visiting certain websites, and making loan payments on time and keeping one’s phone fully charged. However, computer algorithms do not know what any of these things are and consequently have no way of determining whether these are causal relationships (and therefore useful predictors) or fleeting coincidences (that are useless predictors). If the program is black box, then humans cannot intervene and declare that these are almost certainly irrelevant coincidences. Even if Read More ›
The LK-99 BS Further Undermines the Credibility of Science
The rejection or distortion of genuine science can have tragic consequencesSabrina Ionescu’s Hot Hand
When basketball players hit a "streak," does that elevate the probability of success?Using Data Like a Drunk Uses a Lamppost
Startup companies can be tempted to use statistics for support instead of real illuminationThe LLM Deep Fake—Follow the Money
Hype sells. Just look at what's happening with AILarge language models (LLMs) are far more impressive now than they were just a few months or a year ago. Compare these exchanges I had with GPT-3 in June 2022 and with ChatGPT two days ago: Gary: Can I improve my test scores if I study after taking the test? GPT-3: That’s a good question. I’m not sure if you can improve your test scores by studying after taking the test, but it couldn’t hurt to try! ChatGPT: Studying after taking a test is unlikely to directly improve your scores on that particular test since the test has already been completed. Once a test is finished, the opportunity to answer questions and provide responses has passed. However, studying after taking Read More ›
The Death of Peer Review?
Science is built on useful research and thoroughly vetted peer reviewTwo years ago, I wrote about how peer review has become an example of Goodhart’s law: “When a measure becomes a target, it ceases to be a good measure.” Once scientific accomplishments came to be gauged by the publication of peer-reviewed research papers, peer review ceased to be a good measure of scientific accomplishments. The situation has not improved. One consequence of the pressure to publish is the temptation researchers have to p-hack or HARK. P-hacking occurs when a researcher tortures the data in order to support a desired conclusion. For example, a researcher might look at subsets of the data, discard inconvenient data, or try different model specifications until the desired results are obtained and deemed statistically significant—and therefore Read More ›
A World Without Work? Here We Go Again
Large language models still can't replace critical thinkingOn March 22, nearly 2,000 people signed an open letter drafted by the Future of Life Institute (FLI) calling for a pause of at least 6 months in the development of large language models (LLMs): Contemporary AI systems are now becoming human-competitive at general tasks, and we must ask ourselves: Should we let machines flood our information channels with propaganda and untruth? Should we automate away all the jobs, including the fulfilling ones? Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization? FLI is a nonprofit organization concerned with the existential risks posed by artificial intelligence. Its president is Max Tegmark, an MIT professor who is no stranger to hype. Read More ›