^{Brendan Dixon

February 8, 2020

4}

Teaching Computers Common Sense Is Very Hard

_{Those fancy voice interfaces are little more than immense lookup tables guided by complex statistics} _{Brendan Dixon

February 8, 2020

4}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

We can, at times, seem so close to Star Trek.

We can talk to our phones and they will open a website, search the web, dial a number, and more. Both Alexa and Siri will change the channels on our televisions, find a movie for us, and even respond with a wry, witty comment. Certainly, we must be on the verge of a Star Trek-like computer?

We’re not.

For all the glitz and gee-whizziness, our computers do not understand us. Those fancy voice interfaces are little more than immense lookup tables guided by complex statistics. It makes for a good show, but Star Trek remains many light years away.

Researchers at the Allen Institute for Artificial Intelligence (AI2) published a paper recently, deflating claims of rapid progress toward giving computers common sense.

The researchers looked closely at recent progress on handling Winograd Schemas, a standard test for detecting an AI’s ability to mimic common sense reasoning. Some systems claim 90% accuracy.

AWinograd Schema is a pair of sentences that alters the intended meaning by changing just one word. For example:

– The trophy doesn’t fit into the brown suitcase because it’s too large.

– The trophy doesn’t fit into the brown suitcase because it’s too small.

In the first sentence, the pronoun—”it” —refers to the trophy; in the second, the suitcase. Machine learning typically has difficulty with making such a common-sense distinction. The Winograd Schema Challenge contains 273 such carefully crafted sentence pairs.

AI2 researchers tested state-of-the-art systems against a much larger dataset, one containing roughly 44,000 sentence pairs. They found that common sense was still missing:

When they tested state-of-the-art models on these new problems, performance fell to between 59.4% and 79.1%. By contrast, humans still reached 94% accuracy. This means a high score on the original Winograd test is likely inflated. “It’s just a data-set-specific achievement, not a general-task achievement,” says Yejin Choi, an associate professor at the University of Washington and a senior research manager at AI2, who led the research.
Karen Hao, “AI still doesn’t have the common sense to understand human language” at Technology Review

Often the amazing results of AI depend on something called “dataset-specific bias.” In lay terms, the programmers see a pattern in the data that helps solve the problem. For example, IBM’s Watson performed well at Jeopardy, in part, because researchers discovered that stored Wikipedia entries contained the answers to many questions:

Infoboxes, the (now well-known) tables of facts that accompany Wikipedia pages, for instance, are generated automatically from an IE system, applied to Wikipedia pages, populating the DBPedia Knowledge Base. Watson makes use of these existing DBPedia relations in DeepQA, which is sensible because much of the information used by Watson to answer Jeopardy! questions comes from online sources like Wikipedia (more on this later). The structured resources like DBPedia can then be exploited without the need to manage separate efforts developing knowledge resources for use by the system.
Erik J. Larson, “Jeopardy! as a Modern Turing Test: Did Watson Really Win?” at The Best Schools

That’s not knowledge; that’s a trick. And researchers are happy to use tricks to achieve their results. See, for example, “A Surprisingly Robust Trick for the Winograd Schema Challenge.”

This is the difference between computers and humans: We understand; computers regurgitate. We read, evaluate, and make decisions. Computers operate according to patterns and rules. While I expect those patterns and rules to improve, without a conscious mind, computers will not get past regurgitation. It takes a mind to know things and to have common sense.

Note: It’s just as well we are not heading for Star Trek computers. This episode of the original series aired in 1970: “Kirk and a sub-skeleton crew are ordered to test out an advanced artificially intelligent control system – the M-5 Multitronic system, which could potentially render them all redundant.”

Further reading on Winograd schemas:

AI is no match for ambiguity: Many simple sentences confuse AI but not humans (Robert J. Marks)

and

Computers’ stupidity makes them dangerous: The real danger today is not that computers are smarter than us, but that we think computers are smarter than us

Also: Why did Watson think Toronto was in the USA? How that happened tells us a lot about what AI can and can’t do, to this day.