Facebook AI Director Yann LeCun on His Quest to Unleash Deep Learning and Make Machines Smarter


Q&A with one of the people behind deep learning. Good details about machine learning.

Selected quotes describing deep learning:

The current excitement about AI stems, in great part, from groundbreaking advances involving what are known as “[convolutional neural networks][1]” or “deep learning”.

You could think of Deep Learning as the building of learning machines, say pattern recognition systems or whatever, by assembling lots of modules or elements that all train the same way. So there is a single principle to train everything.

Previous systems, which I guess we could call “shallow learning systems,” were limited in the complexity of the functions they could compute. So if you want a shallow learning algorithm like a “linear classifier” to recognize images, you will need to feed it with a suitable “vector of features” extracted from the image. But designing a feature extractor “by hand” is very difficult and time consuming.

An alternative is to use a more flexible classifier, such as a “support vector machine” or a two-layer neural network fed directly with the pixels of the image. The problem is that it’s not going to be able to recognize objects to any degree of accuracy, unless you make it so gigantically big that it becomes impractical.

Shallow learning systems have one or two layers, while deep learning systems typically have five to 20 layers. It is not the learning that is shallow or deep, but the architecture that is being trained.

The type of learning that we use in actual Deep Learning systems is very restricted. What works in practice in Deep Learning is “supervised” learning. You show a picture to the system, and you tell it it’s a car, and it adjusts its parameters to say “car” next time around. Then you show it a chair. Then a person. And after a few million examples, and after several days or weeks of computing time, depending on the size of the system, it figures it out.

Unsupervised learning could help “pre-train” very deep networks. We had quite a bit of success with this, but in the end, what ended up actually working in practice was good old supervised learning, but combined with convolutional nets, which we had over 20 years ago.

We now have unsupervised techniques that actually work. The problem is that you can beat them by just collecting more data, and then using supervised learning. This is why in industry, the applications of Deep Learning are currently all supervised. But it won’t be that way in the future.

The bottom line is that the brain is much better than our model at doing unsupervised learning. That means that our artificial learning systems are missing some very basic principles of biological learning.

Currently with Deep Learning systems, it’s like learning a motor skill. The way we train them is similar to the way you train yourself to ride a bike. You learn a skill, but there’s not a huge amount of factual memory or knowledge involved.

But there are other types of things that you learn where you have to remember facts, where you have to remember things and store them.

A particular model of this type called “Memory Network” was recently proposed by Facebook scientists Jason Weston, Sumit Chopra, and Antoine Bordes. A somewhat related idea called the “Neural Turing Machine” was also proposed by scientists at Google/Deep Mind.

You’ll see better question-answering and dialog systems, so you can converse with your computer; you can ask questions and it will give you answers that come from some knowledge base. You will see better machine translation. Oh, and you will see self-driving cars and smarter robots. Self-driving cars will use convolutional nets.

In Deep Learning systems, entities are represented by large vectors of numbers that are learned from data and represent their properties. Learning to reason comes down to learning functions that operate on these vectors.

I think a form of common sense could be acquired through the use of predictive unsupervised learning. For example, I might get the machine to watch lots of videos were objects are being thrown or dropped. The way I would train it would be to show it a piece of video, and then ask it, “What will happen next? What will the scene look like a second from now?” By training the system to predict what the world is going to be like a second, a minute, an hour, or a day from now, you can train it to acquire good representations of the world.

[1] http://en.wikipedia.org/wiki/Convolutional_neural_network