HomeArticle

AI godfather Hinton first revealed an auction a decade ago: I had already decided in advance that Google would definitely win.

新智元2025-12-22 07:24
Win with overall control

The "meeting of two gods" in the AI world has arrived! During a fireside chat at NeurIPS 2025, AI godfather Hinton and Jeff Dean were on stage together, personally revealing the "good old days" of the AI revolution and many little - known anecdotes.

The once - sensational interview at NeurIPS 2025 is finally out!

AI godfather Hinton and Jeff Dean, the chief scientist at DeepMind, two key figures in the AI circle and old friends who had collaborated for many years, got together.

On - site, Hinton directly threw out a sharp question ——

Does Google regret publishing the Transformer paper?

Jeff Dean gave a straightforward response, "No regrets! Because it has had a huge impact on the world."

Moreover, Hinton publicly revealed that his epiphany about Scaling came from a speech by Ilya.

In the nearly one - hour conversation, the two giants reviewed everything from the early breakthroughs in ML to the current challenges and opportunities shaping the field.

They also shared some very interesting anecdotes ——

From running AlexNet on two GPUs in a bedroom to the early days of Google Brain.

AI godfather's Scaling epiphany comes from Ilya

The conversation started with an interesting common ground:

Both Geoff and Jeff are fascinated by "backpropagation".

Although the paper on this concept was officially published in Nature in 1986, it was actually proposed as early as 1982.

Paper address: https://www.nature.com/articles/323533a0

Jeff Dean recalled his undergraduate thesis ——

In 1990, after taking a course on parallel algorithms and only spending one week on neural networks, he was deeply attracted.

So, he applied to Professor Vipin Kumar at the University of Minnesota to write an honor's thesis on the topic of "parallel algorithms for training neural networks".

At that time, Jeff Dean used a hypercube computer with 32 processors, thinking that with 32 - fold computing power, he could create an amazing neural network.

Paper address: https://drive.google.com/file/d/1I1fs4sczbCaACzA9XwxR3DiuXVtqmejL/view?pli=1

But reality gave him a lesson.

While expanding the number of processors (computing power), the model scale was not expanded synchronously.

He simply split a layer of 10 neurons across 32 processors, resulting in abysmal performance.

Jeff Dean also invented two early concepts: "data parallelism" and "model parallelism" (referred to as "pattern partitioning" at that time).

On the other hand, Hinton shared his "late" awakening to the importance of computing power. He said, "I should have realized the importance of computing power in the late 1980s."

At that time, there were two world - class teams: one was the Berkeley ICSI team, and the other was the Cambridge team.

They used parallel computing to build better speech acoustic models, refreshing the industry's SOTA and outperforming neural networks trained by conventional methods.

But due to the increasing complexity of programming and hardware as the model scale expanded, they didn't persevere.

It wasn't until 2014, after listening to Ilya Sutskever's report, that Hinton finally woke up ——

Scaling is crucial, and this trend will continue.

The birth of AlexNet

ML conquers "image recognition" overnight

Next, the focus of the conversation shifted to AlexNet in 2012, the moment of the AI explosion.

Hinton recalled that Vlad Nair first achieved great success in road recognition and aerial image processing using NVIDIA GPUs, proving that "multi - layer networks are far superior to single - layer ones".

AlexNet is an 8 - layer neural network

At that time, he applied for a funding extension for this project but was rejected by the reviewers ——

This project is not worth funding because it will not have any industrial impact.

On - site, Hinton joked that he really wanted to tell them that this technology contributed 80% of the growth of the US stock market last year.

Subsequently, student Alex Krizhevsky was working on a "mini - image" recognition task, training with the MNIST dataset at that time.

But Alex's attempt failed. Hinton found that the weight decay parameter was set incorrectly and corrected the problem.

At that time, Ilya said, "Why not directly use ImageNet? Such a large dataset will definitely work. We have to do it before Yann LeCun."

Meanwhile, LeCun had also been trying to get his post - docs and students in the lab to apply convolutional neural networks to ImageNet, but everyone thought there were more important things to do.

So, Ilya was in charge of data pre - processing, standardizing the images to a fixed size, and the results were very good.

Hinton joked, "Next, I made the most successful management decision of my life."

As long as the performance on ImageNet improves by 1% every week, Alex is allowed to delay writing the paper review.

As a result, there were continuous successful iterations week after week.

As for the training hardware, it was the well - known "two NVIDIA GTX 580 GPUs".

At that time, Alex completed the training of AlexNet in his own bedroom using these two GPUs. Hinton humorously said, "Of course, we paid for the GPUs, and Alex's parents paid for the electricity bill. It was purely to save money for the University of Toronto."

A casual chat in the tea - room

Gives birth to "Google Brain"

Around the same time, a brand - new team at Google —— Google Brain was in the making.

Jeff Dean recalled that the prototype of Google Brain originated from a casual encounter and chat in the tea - room.

That day, Andrew Ng, then a professor at Stanford (coming to Google one day a week), happened to bump into him.

Andrew mentioned, "My students have already achieved good results using neural networks."

This immediately woke Jeff Dean up, and he thought —— We have plenty of CPUs. Why not train an extremely large neural network?

So, they trained a system that supported model parallelism + data parallelism and scaled it up to thousands of machines.

In this famous experiment, they conducted unsupervised learning on 10 million YouTube video frames, enabling the neural network to learn to recognize "cats".

They didn't use convolution but adopted a "locally connected" approach for vision, resulting in 2 billion parameters.

To complete this training, they used 16,000 CPU cores.

Jeff said, "We've observed that the larger the model, the better the effect. It's just that we hadn't formally summarized it as Scaling Laws at that time."

We even had a catch - phrase, which was somewhat similar to Scaling Laws in a sense: Larger models, more data, and more computing power.

That is to say, one year before the birth of AlexNet, Google Brain had already verified the Scaling Laws.

A 64 - year - old intern joins Google

In the summer of 2012, Andrew Ng switched to the educational platform Coursera because he thought that was the future.

So, he recommended Hinton to take over.

Interestingly, Hinton originally wanted to be a visiting scientist, but he had to work full - time for 6 months to get paid.

So, 64 - year - old Hinton became an "intern" at Google. Moreover, he was Jeff Dean's intern.

After joining Google, Hinton had to take training courses with other interns.

A large room was filled with students, some from IIT and some from Tsinghua. Anyway, there were a lot of very smart people.

On the first day of training, the lecturer said, "Log in with your LDAP and OTP." Hinton was completely confused on the spot. What was LDAP? What was OTP?

After about ten minutes, they decided that one of the teaching assistants would be in charge of me.

The other students were looking around, staring at this person who obviously didn't know anything and was three times their age. To be honest, it was a bit embarrassing.

What was even more embarrassing was that during lunchtime, Hinton happened to run into an undergraduate student he had taught before.

It wasn't until the second day of his employment that Jeff Dean and Hinton first met at a Vietnamese restaurant in Palo Alto.

Google is bound to win in the casino