HomeArticle

Elon Musk let it slip that Claude Opus has 5T parameters and Sonnet has 1T parameters.

量子位2026-04-10 16:50
Grok 4.2 has a total parameter count of 0.5T.

Oh no, did Elon Musk accidentally reveal the parameters of Claude?

To make a long story short: Sonnet 1T, Opus 5T.

The reason is that Musk posted a message saying that xAI's Colossus 2 supercomputer is training 7 models, among which the largest one has a parameter count directly reaching 10 trillion.

The complete list:

Imagine V2

2 variant models with 1 trillion (1T) parameters

2 variant models with 1.5 trillion (1.5T) parameters

A model with 6 trillion (6T) parameters

A model with 10 trillion (10T) parameters

P.S. Colossus 2 is part of Musk's Macrohard plan. According to information disclosed in August 2025, Colossus 2 has installed 119 air-cooled chiller units, providing about 200MW of cooling capacity, which is sufficient to support about 110,000 GB200 NVL72 GPUs.

According to the plan at that time, Colossus 2 will deploy 110,000 NVIDIA GB200 GPUs in the first stage, and the ultimate goal is to exceed 550,000 GPUs, with the peak power demand expected to exceed 1.1GW.

This tweet is also one of the few times that Musk has publicly announced the specific training plan of the Colossus supercomputer.

As soon as the news came out, netizens became curious, and Musk also seemed to be in a good mood and replied to many questions.

For example, someone asked "How long does it take to train a 10T model?", and Musk said the pre-training stage takes about 2 months.

Well, such a conversation appeared.

The parameter count of Grok 4.2 is only 5% of the largest model currently being trained by xAI. That is, 500 billion (500B) compared to 10 trillion (10T), and the latter is 20 times the former.

Is the total parameter count of Grok 4.2 really 500B? Or is it just that the activated parameter count is 500B in a larger MoE?

Facing the doubts, Musk responded in person:

The total parameter count is 0.5T (500 billion). The current Grok has half the parameter count of Sonnet and one-tenth of Opus. For its scale, it is a very powerful model.

Netizens immediately noticed the obvious information, which means that Sonnet is 1T and Opus is 5T.

So someone asked further:

Just curious, how do you (Musk) know the sizes of Sonnet and Opus?

Musk kept silent about this. The point raised by netizens makes sense. "Top talents flow between these few companies, and it seems that no secret can be kept for too long."

Parameters of different versions of Claude, speculated by netizens

Since the launch of the Claude series of models, Anthropic has been keeping the parameter scale strictly confidential. Whether it's Opus or Sonnet, not a single detail has been disclosed.

The less they say, the more enthusiastic netizens are about the discussion.

We used AI to summarize the parameter scales of different versions of Claude analyzed and discussed by netizens.

Surprisingly, the latest models, Claude 4.6 Sonnet ~1 - 2T and Claude 4.6 Opus ~1.5 - 2.5T/2 - 5T, really match the "Sonnet 1T, Opus 5T" accidentally leaked by Musk.

Let's take a look at what netizens have discussed.

Currently, there are four main types of mainstream speculation methods:

Inference cost and throughput reverse - deduction method: The model inference cost has an approximately linear relationship with the activated parameter count, and the total parameter count can be estimated through the architecture type and industry experience coefficients.

Performance benchmark comparison method: By comparing the performance of a closed - source model with an open - source model with known parameters on a standardized benchmark, the parameter scale of the closed - source model can be inferred.

Internal document leakage and rumor analysis method: Information accidentally exposed by the official and some rumors.

Architecture feature analysis method: By observing the behavioral characteristics of the model, the architecture type it uses can be inferred, and then the parameter estimation range can be narrowed.

First, let's look at the Claude 3 series, which was released in March 2024. This is the first Claude series to form a clear product matrix, including three versions with different positions.

The small - cup Haiku, the medium - cup Sonnet, and the large - cup Opus, with the cost and performance increasing in turn.

Regarding their parameter scales, Alan D. Thompson, the founder of LifeArchitect.ai, once gave an estimate:

Claude 3 Haiku (~20B), Claude 3 Sonnet (~70B), Claude 3 Opus (~2T).

Regarding Claude 3 Sonnet, the Reddit community also carried out continuous discussions later. Some netizens also speculated based on performance that the parameter count of Claude 3 Sonnet is between 150 - 250B.

Next, let's look at Claude 3.5, which is a major upgrade of Claude, outperforming GPT - 4o in multiple key indicators.

However, Anthropic initially only released the single - model Claude 3.5 Sonnet.

Its speed is twice that of Claude 3 Opus, but the cost is only 1/5 of the latter.

Regarding the model parameter count, Microsoft and others once published a paper.

It pointed out that according to industry estimates, the parameters of Claude 3.5 Sonnet are about 175B.

Here are the estimated parameters of other models: ChatGPT is about 175B, GPT - 4 is about 1.76T, GPT - 4o is about 200B, o1 - mini is about 100B, and o1 - preview is about 300B.

After that, Anthropic skipped the 3.5 naming and did not release 3.5 Opus. After Claude Sonnet 3.7, it directly entered the 4 series and released two models:

Claude Opus 4 and Claude Sonnet 4.

There are major differences in the parameter estimates of Claude 4 in the industry.

Industry estimates suggest that the parameters of Claude Opus 4 are between 300 - 500B, and those of Claude Sonnet 4 are between 50B - 100B.

Next, Claude Opus 4.1 was released.

Its programming performance has once again broken through, surpassing Claude Opus 4, and it has also been further upgraded in Agent tasks and reasoning.

However, when it was released, the official said that they plan to carry out larger - scale upgrades and improvements to the model in the next few weeks. It seems that 4.1 is just a minor update to replace Opus 4.

Some netizens even discussed that Anthropic may not have intended to release the model at first, but due to the large amount of news about GPT - 5/Gemini - 3, they updated it first to maintain market competitiveness.

This may also be one of the reasons why there