HomeArticle

"AI Godfather" Hinton's Nobel Prize speech tops high-impact journal for the first time. By avoiding formulas, he enables the whole audience to instantly understand the "Boltzmann machine".

新智元2025-09-03 19:24
It is well-deserved that the "godfather of AI," Geoffrey Hinton, has won the Nobel Prize. Now, his stunning speech on the "Boltzmann machine" has been published in the APS journal. What exactly was discussed in this "historical catalyst" that sparked the deep learning revolution?

On December 8, 2024, Hinton, the Nobel laureate in Physics, took the stage and delivered a speech titled "Boltzmann Machines".

At that time, the Aula Magna auditorium at Stockholm University was packed, and the global spotlight was focused here.

He shared in an easy - to - understand way his journey with John Hopfield in using neural networks to drive fundamental discoveries in machine learning.

Now, the core content of Hinton's speech was officially published in the journal of the American Physical Society (APS) on August 25.

Paper link: https://journals.aps.org/rmp/pdf/10.1103/RevModPhys.97.030502

In the 1980s, there were two promising gradient calculation techniques -

One is the backpropagation algorithm, which has now become the core engine of deep learning and is almost ubiquitous.

The other is the Boltzmann machine learning algorithm, which is no longer in use and has gradually faded from people's view.

This time, the focus of Hinton's speech was the "Boltzmann Machine".

At the beginning, he humorously said that he planned to do a "stupid" thing and decided to explain complex technical concepts to everyone without using formulas.

Hopfield Network

Finding the Lowest Energy Point

What is the "Hopfield Network"?

Hinton started with a simple binary neural network and introduced the core idea of the "Hopfield Network".

Each neuron has only two states, 1 or 0. Most importantly, the neurons are connected by symmetric weights.

The global state of the entire neural network is called a "configuration" and has a "goodness".

Its "goodness" is determined by the sum of the weights between all active neurons. For example, in the figure above, all the red boxes have a total weight of 4.

This is the goodness of the network configuration, and the energy is the negative of the goodness.

The whole point of the "Hopfield Network" is that each neuron decides how to reduce energy through local calculations.

Here, energy represents "badness". Therefore, whether to turn on or off a neuron depends entirely on the "sign" of the total weighted input.

Through continuous updates of the neuron states, the network will eventually stabilize at the "lowest energy point".

However, it is not the only low - energy point because the "Hopfield Network" can have many lowest energy points. Which point it finally stays at depends on the initial state and the random decision sequence of which neuron to update.

Below is a better lowest energy point. When the right - hand neural network is activated, its goodness is 3 + 3 - 1, and the energy is - 5.

The charm of the "Hopfield Network" lies in its ability to associate the lowest energy points with memories.

Hinton vividly described it as "when you input an incomplete memory fragment and then continuously apply the binary decision rule, the network can complete the full memory".

Therefore, when the "lowest energy point" represents a memory, the process of stabilizing the network at the lowest energy point is the so - called "content - addressable storage".

This means that by activating only a part of an item to access a certain item in the memory, and then applying this rule, the network will complete it.

Not Only for Memory Storage

But Also for Interpreting "Sensory Inputs"

Next, Hinton further shared his and Terrence Sejnowski's (a student of Hopfield) innovative application of the "Hopfield Network" -

Using it to construct interpretations of sensory inputs, not just for storing memories.

They divided the network into "visible neurons" and "hidden neurons".

The former receives sensory inputs, such as a binary image; the latter is used to construct an interpretation of the sensory input. The energy of a certain configuration of the network represents the badness of the interpretation, and they want an interpretation with low energy.

Hinton took the classic ambiguous line drawing - the Necker cube as an example to show how the network handles the complexity of visual information.

For the following drawing, some people may see it as a "convex polyhedron", while others may see it as a "concave polyhedron".

So, how can we make the neural network draw two different interpretations from this line drawing? Before that, we need to think about: what information can a line in the image tell us about a three - dimensional edge?

Visual Interpretation: From 2D to 3D

Imagine that you are looking out of a window at the world outside and then drawing the outline of the scenery you see on the glass.

At this time, the black line on the window is actually an edge you drew.

And the two red lines are the lines of sight starting from your eyes and passing through the two ends of this black line.

So the question is: what kind of edge in the real world forms this black line?

Actually, there are many possibilities. All different three - dimensional edges will ultimately produce the same line in the image.

Therefore, the most troublesome thing for the visual system is how to infer back from this two - dimensional line and determine which edge actually exists in reality?

For this reason, Hinton and Sejnowski designed a network that can convert the lines in the image into the activation states of "line neurons".

Then, through excitatory connections, it is connected to the "three - dimensional edge neurons" (in green), and they are made to inhibit each other to ensure that only one interpretation is activated at a time.

In this way, many principles of perceptual optics are reflected.

Next, Hinton applied this method to all neurons. The question is, which edge neurons should be activated?

To answer this question, more information is needed.

When humans interpret images, they all follow specific principles. For example, when two lines intersect, it is assumed that they also intersect at the same point in three - dimensional space and have the same depth.

In addition, the brain tends to regard objects as intersecting at right angles.

By reasonably setting the connection strength, the network can form two stable states, corresponding to the two three - dimensional interpretations of the "Necker cube" - the concave polyhedron and the convex polyhedron.

This visual interpretation method brings two core problems:

Search problem: The network may get stuck in a local optimum and stay at a poor interpretation, unable to jump to a better one.

Learning problem: How to make the network automatically learn the connection weights instead of setting them manually.

Search Problem: Neurons with Noise

For the "search problem", the most basic solution is to introduce neurons with noise, that is, "stochastic binary neurons".

The states of these neurons are "binary" (either 1 or 0), but their decisions are highly probabilistic.

A strong positive input will turn them on; a strong negative input will turn them off; and an input close to zero introduces randomness.

Noise allows the neural network to "climb the slope" and jump from a poor interpretation to a better one, just like searching for the lowest point in a valley.

Boltzmann Distribution + Machine Learning

By randomly updating the hidden neurons, the neural network will eventually approach the so - called "thermal equilibrium".

Once thermal equilibrium is reached, the states of the hidden neurons form an interpretation of the input.

At thermal equilibrium, low - energy states (corresponding to better interpretations) have a higher probability of occurring.

Taking the Necker cube as an example, the network will ultimately tend to choose a more reasonable three - dimensional interpretation.

Of course, thermal equilibrium does not mean that the system stays in a single state. Instead, the probability distribution of all possible configurations is stable and follows the Boltzmann distribution.

In the Boltzmann distribution, once the system reaches thermal equilibrium, the probability of it being in a certain configuration is completely determined by the energy of that configuration.

Moreover, the system has a higher probability of being in a low - energy configuration.

To understand thermal equilibrium, physicists have a trick - you just need to imagine a huge "ensemble" composed of a large number of identical networks.

Hinton said, "Imagine countless identical Hopfield networks, each starting from a random state, and through random updates, the proportion of configurations gradually stabilizes."

Similarly, low - energy configurations have a higher proportion in the "ensemble".

In summary, the principle of the Boltzmann distribution is that low - energy configurations are much more likely to occur than high - energy configurations.

In the "Boltzmann Machine", the goal of learning is to ensure that when the network generates images, which can essentially be called "dreaming, random imagination", these match the impressions formed when it perceives real images in the "waking" state.

If this match can be achieved, the states of the hidden neurons can effectively capture the deep - seated reasons behind the images.