Geoffrey Hinton has a premonition about the next step in AI

Deep learning has revolutionized the latest AI revolution, transforming computer vision and the entire field.Hinton believes All you need is deep learning. It perfectly duplicates human intelligence.

However, despite rapid progress, there are still major challenges. Exposing neural nets to unfamiliar datasets and external environments proves to be fragile and inflexible in itself. Self-driving cars and language generators for writing essays are impressive, but things can go wrong. AI visual systems can easily get confused. If the system was not trained in that view, the side-recognized coffee mug would be unknown from above. With a few pixels of manipulation, a panda can be mistaken for an ostrich or a school bus.

GLOM addresses two of the most difficult problems for visual recognition systems. Understanding the entire scene in terms of objects and their natural parts. It recognizes objects from a new perspective (GLOM’s focus is on vision, but Hinton hopes this idea can be applied to languages ​​as well).

For example, objects such as Hinton’s face consist of the dog’s tired eyes (too many questions, too little sleep), mouth and ears, and a prominent nose. -Mostly gray mess. And given his nose, he is easily recognized at a glance in the profile view.

Both of these factors (relationships and perspectives of the whole part) are important to how humans perceive from Hinton’s point of view. “If GLOM works, it will perceive in a much more human way than current neural networks,” he says.

However, grouping parts together can be a difficult problem for your computer because the parts can be ambiguous. The circle can be an eye, a donut, or a wheel. As Hinton explains, first-generation AI vision systems recognize objects primarily by relying primarily on the geometry of whole-part relationships, that is, the spatial orientation between parts and between parts and whole. I tried. Instead, the second generation relied primarily on deep learning, training neural networks with large amounts of data. Hinton uses GLOM to combine the best aspects of both approaches.

Gary Marcus, founder and CEO of Robust.AI, who criticizes his heavy reliance on deep learning, said: Marcus praises Hinton’s willingness to challenge something that has brought him fame and admits that it is not fully functional. “It’s brave,” he says. “And saying’I’m trying to think outside the box’is a great fix.”

GLOM architecture

To Creating a GLOMHinton sought to model some spiritual shortcuts (intuitive strategies, or heuristics) that people use to understand the world. “GLOM, and in fact much of Geoff’s work, looks at the heuristics people think they have and builds neural nets that can have those heuristics, resulting in a better visual. “To show,” says Nick Frosst. A computer scientist at a Toronto language startup who worked with Hinton on Google Brain.

For vision, one strategy is to analyze parts of an object, such as different facial features, and thereby understand the whole thing. If you look at a particular nose, you may recognize it as part of Hinton’s face. It is the hierarchy of the whole part. To build a better vision system, Hinton says, “I have a strong intuition that I need to use the whole hierarchy.” The human brain understands the composition of this whole part by creating what is called an “analysis tree”. This is a bifurcation diagram showing the hierarchical relationships between the whole, its parts, and the sub-parts. The face itself is on a tree, and the components eyes, nose, ears, and mouth form the lower branches.

One of Hinton’s GLOM’s main goals is to replicate the parse tree to a neural network. This distinguishes it from previous neural networks. For technical reasons, it’s difficult to do. “It’s difficult because each image is parsed into a unique parse tree by a person, so I’d like the neural network to do the same,” says Frosst. “It’s difficult to get a new structure (analysis tree) every time a new image is displayed using a static architecture (neural network).” Hinton made various attempts. GLOM is a major revision of his previous endeavors in 2017 and is combined with other relevant advances in this area.

“I’m part of my nose!”

GLOM vector

MS TECH | EVIATAR BACH via Wikimedia

The general idea of ​​the GLOM architecture is as follows: Images of interest (for example, Hinton’s face photo) are split into grids. Each area of ​​the grid is a “location” on the image. One location may contain the iris of the eye and another may contain the tip of the nose. Each location in the net has about 5 layers or levels. Then, at each level, the system makes predictions using vectors that represent content or information. At the level near the bottom, the vector representing the position of the tip of the nose may predict “I am part of the nose!”. And at the next level, in constructing a more consistent representation of what you’re looking at, the vector might predict “I’m part of the face from the side!”.

But the question is whether adjacent vectors at the same level match. If they match, the vectors point in the same direction and are heading for the same conclusion. “Yes, we both belong to the same nose.” Or move the analysis tree further up. “Yes, we both belong to the same face.”

For consensus on the nature of the object, and ultimately what the object is exactly, iterate over the GLOM vectors, averaging the adjacent vectors horizontally by location, layer by layer, and predicted from the top and bottom levels. Find the vector. ..

But Hinton says the net isn’t a “nasty average” just because something is nearby. Selectively average using adjacent predictions that show similarity. “This is well known in the United States and is called an echo chamber,” he says. “What you’re doing is accepting only the opinions of those who already agree with you, and what happens is that you have an echo chamber where many people have exactly the same opinion. That is to put in. GLOM is actually using it in a constructive way. “A similar phenomenon in Hinton’s system is their” island of consensus. “

“Jeff is a very rare thinker …”


“Imagine a lot of people in a room screaming for subtle changes in the same idea,” says Frost. Or imagine those people as a vector pointing to small changes in the same direction. “After a while, they converged on one idea and were confirmed by others around them, so everyone felt it stronger.” This is the collective prediction of the GLOM vector on the image. Is a way to enhance and amplify.

GLOM uses these islands of matching vectors to perform tricks to represent the parse tree in a neural network. Some modern neural networks use consensus between vectors. Activation, GLOM Expression— Build an expression of things on the net. For example, if some vectors agree that they all represent part of the nose, then those small matching clusters collectively represent the nose in the face net analysis tree. Another small cluster of matching vectors may represent the mouth of the parse tree. And the large cluster at the top of the tree represents a new conclusion that the entire image is Hinton’s face. “The way we represent the parse tree here is that there are large islands at the object level,” Hinton explains. The object part is a small island. Sub-parts are even smaller islands. “

Figure 2 from Hinton’s GLOM paper. Islands of the same vector at different levels (arrows of the same color) represent the analysis tree.

Geoffrey Hinton

According to Hinton’s longtime friend and collaborator, computer scientist at the University of Montreal, Yoshua Bengio, it would be a great achievement if GLOM could solve the engineering challenge of representing analytical trees in neural networks. Make the neural network work properly. “Jeff has created surprisingly powerful intuitions over and over in his career, many of which have proven to be correct,” says Bengio. “Therefore, I pay attention to them, especially when he feels as strong as about GLOM.”

Hinton’s belief is rooted not only in the Echo Chamber analogy, but also in the mathematical and biological analogies that influenced and justified some of GLOM’s new engineering design decisions.

“Jeff is a very rare thinker in that he can take advantage of complex mathematical concepts and integrate them with biological constraints to develop theories,” is now a computational cognitive neuroscientist at McMaster University. Subecker, a former student of Hints, said. “Researchers focused on either mathematical theory or neurobiology are far less likely to solve the infinitely compelling puzzle of how both machines and humans learn and think. Become.”

Turn philosophy into engineering

So far, Hinton’s new ideas have been well received, especially in some of the world’s largest echo chambers. “I had a lot of likes on Twitter,” he says. And YouTube In the tutorial, I insisted on the term “MeGLOMania”.

Hinton was the first to admit that GLOM is now just a philosophical meditation (he spent a year as an undergraduate in philosophy before switching to experimental psychology). “If an idea feels good in philosophy, it’s good,” he says. “How do you come up with a philosophical idea that sounds like garbage but turns out to be true? It doesn’t work as a philosophical idea.” In comparison, science says, ” It’s full of things that sound like complete trash, “he says, but it turns out to work very well, such as neural networks.

GLOM is designed to sound philosophically plausible. But does it work? Geoffrey Hinton has a premonition about the next step in AI

Back to top button