Die ersten praktischen Tests von GPT - 5.6: Präziser Angriff auf den Mythos

Diesen Monat veröffentlicht!

Just now, Anthropic has unveiled its weapon that has been hidden for two months - Claude Fable 5 and Mythos 5. It was like dropping a bomb.

Now, the pressure is directly on OpenAI.

At the same time, GPT - 5.6 was also leaked.

Since last week, OpenAI has been testing two new checkpoints with the internal code names kepler and kindle. It is known that kindle - alpha has been selected as the release candidate.

The internal test version of GPT - 5.6 is being vigorously tested by foreign developers and in leak circles. The code name, candidate versions, and performance during testing have all been discovered.

Whether it's about striving for an IPO or the collision of flagship models, both companies are acting like "You submit an application, I'll do it too" and "You introduce a new model, I'll do it too".

It's a real battle.

But the question is: Can GPT - 5.6 really beat Mythos?

GPT - 5.6 appears

So far, OpenAI has not made an official announcement about GPT - 5.6 and has not officially released it.

Nevertheless, many foreign Internet users have already conducted special tests on the unreleased "internal checkpoints".

A checkpoint is a snapshot of a model's parameters at a specific time during training.

OpenAI stores many such snapshots, compares them with each other, and then selects a version that is considered "good enough for release". This version is called the release candidate (RC).

Since last week, OpenAI has been internally testing two new checkpoints with the code names kindle and kepler. Among them, kindle - alpha has been selected as the release candidate.

From the experience reports that have come out, the most frequently mentioned improvement of GPT - 5.6 seems to be frontend/UI generation.

The Internet user Pankaj Kumar says that the frontend generation ability of kindle - alpha is greatly improved. Without complicated prompts or additional tricks, it can directly generate better user interfaces.

In addition, its visual ability is also very good. It performs well in image understanding and image referencing tasks and has overall significant improvements in inference, coding, and UI generation.

This is the result of the test of kindle by the Internet user Chris with the medium setting:

And this is the result of an earlier test of a non - inferring model named Joule by another Internet user:

You can see that the first result is much more appealing.

But the Internet user Leo has tested both versions, kepler and kindle, with the same prompt and the xhigh setting.

He has found that kindle even shows a deterioration compared to kepler.

Well... it's really hard to judge these results.

He even assumes that OpenAI will probably continue to work on the model and it cannot be ruled out that the kindle version will be discarded in the end.

The latest news is that kindle has been removed from the arena and a new model Levi has appeared.

Some Internet users suspect that Levi could also be a code name for an internal version of GPT - 5.6 and have compared its frontend abilities with those of GPT - 5.5:

You can see that Levi's frontend is also very good. It has a fresh and minimalist style, high standards, and good detail processing.

However, some Internet users have found after an investigation that Levi may come from Meta and not from GPT - 5.6.

So, can GPT - 5.6 really beat Mythos?

The Internet user mark_k claims that GPT - 5.6 "beats Mythos in several agentic coding benchmarks".

But currently, the test by the Internet user Leo is more convincing. He thinks that the situation of GPT - 5.6 is not rosy:

Kindle is a deterioration compared to kepler. In its current form it would be easily beaten by Mythos.

In June: The "Speed and Passion" of the "Top Three"

In June, summer has come, and the world of large models is also heating up.

The release times of the models of the three major foreign AI companies all coincide: Fable 5, Gemini 3.5 Pro, and GPT - 5.6 - a "race against time" is taking place here.

And they are competing in the same capabilities - inference, agent ability, coding, and frontend generation.

Interestingly, all three companies have set their release dates for June, but so far, only Anthropic has really "played its card".

Gemini 3.5 Pro was introduced at the Google I/O conference on May 19th and focuses on 2 million token context and Deep Think inference.

However, it is not officially online yet, and the official release is planned for June.

It is said that GPT - 5.6 will be released later this month.

This puts OpenAI in an exciting situation: The competition has already presented its results, while OpenAI may still be arguing about which RC version to release.

But besides the benchmark, pricing is also an important factor.

Fable 5 and Mythos 5 have a unified price of $10 per million input tokens and $50 per million output tokens.

That's about twice the current price of Opus.

If GPT - 5.6 matches or is slightly behind Mythos in performance but is much cheaper, it still has a chance to make up for it in the actual usage ratio.

So far, OpenAI has not made an official announcement. The real battle will only begin when the official version of GPT - 5.6 is directly compared with Fable -

Probably the result will be clear this month. So stay tuned!

Reference links:

[1]https://x.com/mark_k/status/2063922897341567488?s=20

[2]https://x.com/AiBattle_/status/2064078302394917157?s=20

[3]https://x.com/pankajkumar_dev/status/2063272015214354908?s=20

[4]https://x.com/synthwavedd/status/2063245096951160865?s=20

[5]https://x.com/ChrissGPT/status/2063135842906808579?s=20

[6]https://x.com/koltregaskes/status/2062806155139912164?s=20

This article is from the WeChat account "Quantum Bit", author: Tingyu. Published by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Die ersten praktischen Tests von GPT-5.6 sind da – präziser Angriff auf Mythos

GPT - 5.6 appears

In June: The "Speed and Passion" of the "Top Three"