HomeArticle

Musk trained the smartest AI in history with 200,000 GPUs. Grok 4 has returned to the top of the world, leaving human PhDs defeated across the board

新智元2025-07-11 08:18
Musk, fight a desperate battle.

Elon Musk has finally played his last card - Grok 4. This globally smartest AI has topped the global charts as soon as it went live, outperforming all other large models. The newly launched Grok Heavy comes with a monthly subscription fee of $300. Musk has predicted that Grok will discover new physics next year.

Elon Musk's all - out effort has achieved a great victory!!

At the press conference on July 10th, xAI's bombshell, Grok 4, finally made its long - awaited debut.

It can be said to be the smartest AI in the world!

It not only surpasses most human postgraduate students but is even better than many doctoral students.

The press conference lasted for an hour. Here is a brief summary:

Now, users can access SuperGrok. The regular version costs $30 per month, and the Heavy version costs $300 per month.

Meanwhile, the Grok 4 API has been officially opened to all developers and will be available on third - party cloud platforms.

Although he has suffered a setback in the political arena, he has made a comeback in the AI field.

Sure enough, Elon Musk is still the legendary undefeatable man.

200,000 GPUs, Outstanding Performance in HLE

In various exams and benchmark tests, Grok 4 has achieved amazing results.

For example, it always gets a perfect score in the SAT, even if it has never seen the questions before.

In the GRE, it can almost get a perfect score in all subject areas, whether it's humanities, languages, mathematics, physics, or engineering.

It can be said that in all subject areas, Grok 4 is smarter than almost all human postgraduate students.

How did it achieve this? Elon Musk revealed the secret.

First of all, compared with Grok 2, the training time of Grok 4 is 100 times that of Grok 2.

From Grok 2 to Grok 3, xAI mainly invested in pre - training computing power. But from Grok 3 to Grok 4, a large amount of computing power has been invested in inference and reinforcement learning.

Through training Grok 2, the team expanded pre - training on a large scale for the first time.

This made them realize that if they handle data ablation, infrastructure, and algorithms more carefully, they can increase the scale of pre - training by 10 times, thus creating the top - notch pre - trained foundation model!

Netizens exclaimed: Did xAI really invest as much computing power in reinforcement learning as in pre - training? That's crazy!

This is why xAI spent a huge amount of money to build the world - class supercomputer Colossus with 100,000 H100 GPUs.

If it can collect reward data with verifiable results, it can train the model to think, reason, and correct its own mistakes from first principles. This is where Grok 2's reasoning ability comes from.

So, what will happen if all 200,000 GPUs of the Colossus supercomputer are put into use?

The answer is the birth of Grok 4!

In the "Last Human Exam" (HLE), Grok 4 achieved outstanding results.

The HLE has a total of 2,500 questions covering multiple disciplines. When it was first released earlier this year, most models only scored in single - digits in terms of accuracy.

The reason is that the HLE questions are extremely difficult. For example, there are a math question about natural transformations in category theory, an organic chemistry question about electrocyclic reactions, and a linguistics question about distinguishing closed and open syllables in Hebrew.

Obviously, these questions are at the doctoral level or even more advanced.

Almost no human can answer all these questions correctly and get a high score. If someone can answer 5% correctly, they are considered extremely smart.

However, Grok 4 has reached the doctoral level in all areas of the HLE and even outperforms most human doctoral students, as the latter are likely to fail.

Of course, if there is any shortcoming of Grok 4 at present, it is that it has not invented new technologies or discovered new physics yet.

But Musk believes that it's just a matter of time - Grok will invent new technologies as early as the end of this year and discover new physics next year.

Massive Computing Power Trains the World's Smartest AI

Researchers on the team revealed that at the beginning, Grok 4's accuracy was only in single - digits.

But as more and more computing power was invested, a miracle happened! Eventually, it solved a quarter of the difficult questions in the HLE without any tool assistance.

After being given the ability to use tools and integrating tool - use directly into the training process, Grok 4's performance skyrocketed.

Moreover, currently, Grok 4 has not used any powerful corporate - level tools.

If it is provided with enterprise - level tools such as finite element analysis, computational fluid dynamics, collision simulation, and high - precision physical simulators used by Tesla or Space X, there is no doubt that Grok 4 will undergo a revolutionary change!

For example, if Grok is combined with Optimus, it can interact with the real world, propose hypotheses, and verify their authenticity by itself.

The "Heavyweight" Grok Heavy Debuts

In addition to the issue of computing power, another major problem we need to solve is how to break through the data bottleneck.

The principle of RL is not only to find a large number of challenging reinforcement learning problems but also to have a reliable signal to tell the model whether it is right or wrong.

However, nowadays, we are running out of available test questions! Most difficult problems that humans cannot solve have become easy for AI.

Fortunately, we have an excellent referee, which is reality. Physics is the ultimate law, and the ultimate reasoning test for AI is the real world.

Let's imagine if a single AI agent can solve 40% of the problems, what if multiple agents are run simultaneously?

This is what is called test - time compute. With its expansion, Grok 4 can solve more than 50% of the pure text questions in the HLE.

If multiple AI agents are generated in parallel, Grok 4 Heavy is born!

These agents will work independently, compare their results with each other, and decide which one is better. Once an agent finds a key solution, it will share it with other agents, and finally, they will communicate to reach a final answer.

This is why Grok 4 is called "Heavy" because the scale of testing and computing has increased by an order of magnitude.

· Amazing Demonstration of Grok 4 Heavy

Grok 4 Heavy has evolved to not only be good at exams but also solve various tricky problems in the real world!

For example, we can ask it to predict the championship odds of each team in the current Major League Baseball (MLB).

It can calculate that the Los Angeles Dodgers are the favorites to win the championship this year, with a winning probability of 21.6%.

Moreover, we can also ask it to generate a visualization of the collision of two black holes.

In the following image, we can see that the collision process from the inspiral of two black holes, through the merger, to the ring - down phase is generally correct.

What's more wonderful is that it uses the post - Newtonian approximation method instead of calculating the general relativity effects near the center of the black hole.

That is to say, it conducts a real - world simulation and makes a lot of inferences about the physical constants to be used.

Additionally, it can find the xAI employee with the weirdest profile picture on X.

The most amazing thing is that it actually understands what "we