HomeArticle

The first batch of GPT-5.6 real-world tests is here, precisely targeting Mythos

量子位2026-06-10 15:17
Released this month!

Just now, Anthropic unleashed a powerful weapon it had kept under wraps for two months - Claude Fable 5 and Mythos 5, which is like dropping a bomb.

Now the pressure is directly on OpenAI.

At the same time, GPT - 5.6 has also been leaked.

Since last week, OpenAI has been testing two new checkpoints with internal code names kepler and kindle. kindle - alpha has been reported to be selected as the release candidate.

The internal test version of GPT - 5.6 has started to be intensively tested by overseas developers and in the leak circle. The code names, candidate versions, and performance scores have all been dug out.

Whether it's the competition for IPO or the collision of flagship models, the two companies are in a situation of "you submit your prospectus and I'll submit mine", "you release a new model and I'll release one too".

They are really in a fierce battle.

But the question is, can GPT - 5.6 really beat Mythos?

GPT - 5.6 Emerges

As of now, OpenAI has made zero official announcements about GPT - 5.6 and it has not been officially released.

However, many overseas netizens have conducted probe tests on the "internal checkpoints" that have not been made public.

The so - called checkpoint is a snapshot of the model's parameters saved at a certain point during the training process.

OpenAI stores many such snapshots internally. After a horizontal comparison, they select a version they think is "good enough to be released", and this version is called the release candidate (RC).

Since last week, OpenAI has been testing two new checkpoints, code - named kindle and kepler respectively. Among them, kindle - alpha has been selected as the release candidate.

Judging from the feedback that has leaked out, the most frequently mentioned upgrade of GPT - 5.6 this time is front - end/UI generation.

Netizen Pankaj Kumar said that the front - end generation ability of kindle - alpha has been greatly improved. It can directly produce stronger interface outputs without complex prompts or additional skills.

In addition, its visual ability is also quite strong. It performs well in image understanding and image citation tasks, and there are obvious improvements in reasoning, coding, and UI generation as a whole.

This is the result of netizen Chris's actual test of kindle using the medium setting:

And this is the result of another netizen's previous actual test on the non - reasoning version Joule:

It can be seen that the former is much more exquisite.

But netizen Leo tested the kepler and kindle versions with the same prompt at the xhigh setting.

He found that kindle has actually regressed compared to kepler.

Well... It's really hard to evaluate this result.

He even speculated that OpenAI is likely to continue to polish the model, and there is a possibility that the kindle candidate version will be abandoned in the end.

The latest news is that kindle has been removed from Arena, and a new model Levi has appeared.

Some netizens speculate that Levi may also be a code name for an internal version of GPT - 5.6, and they compared its front - end ability with that of GPT - 5.5:

It can be seen that Levi's front - end is also quite powerful, with a fresh and simple style, a sense of sophistication, and excellent detail handling.

However, some netizens found after investigation that Levi may come from Meta, not GPT - 5.6.

So, can GPT - 5.6 really beat Mythos?

Netizen mark_k claims that GPT - 5.6 "beats Mythos on multiple agentic coding benchmarks".

But at present, the more convincing evidence is the actual test by netizen Leo mentioned earlier. He believes that the situation of GPT - 5.6 is not optimistic:

Kindle has regressed compared to kepler. In its current form, it will be easily defeated by Mythos.

In June, the "Speed and Passion" of the Big Three Unfolds

In June, summer has arrived, and the large - model circle is also heating up.

The model release times of the three major overseas AI companies have all collided: Fable 5, Gemini 3.5 Pro, and GPT - 5.6, staging a "race against time".

Moreover, they are competing in the same set of capabilities - reasoning, agents, coding, and front - end generation.

Interestingly, although all three companies have set their release dates in June, so far, only Company A has actually released its product.

Gemini 3.5 Pro was unveiled at the Google I/O Conference on May 19th, featuring a 2 - million - token context and Deep Think reasoning.

But it has not been officially launched yet, and the official release is scheduled for June.

It is reported that GPT - 5.6 will be released later this month.

This also adds a layer of tension to OpenAI's situation: the competitors have already shown their scores, and OpenAI may still be struggling to decide which RC version to release.

But in addition to performance scores, pricing is also an important factor.

Fable 5 and Mythos 5 are uniformly priced at $10 per million input tokens and $50 per million output tokens.

This is about twice the price of the existing Opus.

If GPT - 5.6 is on par with or slightly inferior to Mythos in terms of capabilities but is much cheaper, it still has a chance to gain an edge in terms of actual adoption rate.

Currently, OpenAI has not made any official announcements. The real showdown will come when the official version of GPT - 5.6 goes head - to - head with Fable in performance tests -

The outcome will most likely be clear this month. Stay tuned!

Reference links:

[1]https://x.com/mark_k/status/2063922897341567488?s=20

[2]https://x.com/AiBattle_/status/2064078302394917157?s=20

[3]https://x.com/pankajkumar_dev/status/2063272015214354908?s=20

[4]https://x.com/synthwavedd/status/2063245096951160865?s=20

[5]https://x.com/ChrissGPT/status/2063135842906808579?s=20

[6]https://x.com/koltregaskes/status/2062806155139912164?s=20

This article is from the WeChat official account "QbitAI", author: Tingyu. Republished by 36Kr with permission.