A recent paper tries to outline the difference between what humans see and what machines see:
“The way that the human mind, the human visual system, understands shape is a mystery that has baffled people for many generations, partly because it is so intuitive and yet it’s very difficult to program” says Jacob Feldman, a psychology professor at Rutgers University.
A paper published in Scientific Reports in June comparing various object recognition models came to the conclusion that people do not evaluate an object like a computer processing pixels, but based on an imagined internal skeleton. In the study, researchers from Emory University, led by associate professor of psychology Stella Lourenco, wanted to know if people judged object similarity based on the objects’ skeletons—an invisible axis below the surface that runs through the middle of the object’s shape. The scientists generated 150 unique three-dimensional shapes built around 30 different skeletons and asked participants to determine whether or not two of the objects were the same. Sure enough, the more similar the skeletons were, the more likely participants were to label the objects as the same. The researchers also compared how well other models, such as neural networks (artificial intelligence–based systems) and pixel-based evaluations of the objects, predicted people’s decisions. While the other models matched performance on the task relatively well, the skeletal model always won.Dana G. Smith, “No Bones About It: People Recognize Objects by Visualizing Their Skeletons” at Scientific American
The paper, published in June in Nature Scientific Reports is open-access.
Other researchers have critiqued the underlying assumptions of the study:
“The shapes that they generated are directly related to the hypothesis they’re testing and the conclusions they’re drawing,” says James Elder, a professor of human and computer vision at York University in Toronto. “If we’re interested in how important skeletons are to shape and object perception, we can’t really answer that question by only looking at the perception of skeleton-generated shapes. Because obviously in a world of skeleton-generated shapes, skeletons are probably fairly important because that’s the way those shapes were made.”Dana G. Smith, “No Bones About It: People Recognize Objects by Visualizing Their Skeletons” at Scientific American
That said, humans can intuit the underlying forms that govern shapes by attempting to guess the intentions of other humans. Machine vision does not intuit things, which may be one reason for odd misidentifications. For example, is this a teapot or a golf ball?:
Machine vision thought there was only .41% chance that item pictured above was a teapot. Readers probably assumed it was intended to be a teapot because the teapot shape is iconic. And, whatever it is intended to be, it wouldn’t roll evenly, which is the essential feature of a ball. The machine does not think of things like that on its own.
Here’s another example, via Auburn University, of how machine vision has interpreted objects:
These are not errors a human would make because, once we grasp the essential nature of the object, we assign probabilities based on known facts. A school bus might all too easily end up on its side blocking the road but will not turn into a punching bag.
Perhaps we are learning as much about ourselves from this research as we are about the machines.
Humans do this all the time: Put a dog costume on a cat and we’ll still see a cat. We know enough to ignore the costume and see the cat. In a sense, we compress the canine information we receive to get at the feline essence. AI researchers had hoped (suggested? argued?) that Deep Learning systems do, more or less, the same thing. It turns out, they don’t.
Further reading: Researchers: Deep Learning vision is very different from human vision. Mistaking a teapot shape for a golf ball, due to surface features, is one striking example from a recent open-access paper.
Artificial Intelligence Is Actually Superficial Intelligence, The confusing ways the word “intelligence” is used belie the differences between human intelligence and machine sophistication. (Brendan Dixon)