HomeArticle

The new favorite of global AI developers: Step 3.5 Flash from Jieyue Xingchen, topped the OpenRouter trend list in two days.

雷科技2026-02-07 13:02
Conquering global AI developers is more convincing than any benchmark.

If a model can not only effectively implement the capabilities of an Agent, but also run fast enough without getting "stuck" during multi - round reasoning, and has a low hallucination rate, what will happen? The answer is:

Front - line developers and users will soon "vote" with real - money Tokens.

This is exactly what is happening in the global AI circle. After the release of the open - source model Step 3.5 Flash by Jieyue Xingchen, it quickly became popular worldwide. Not only did it rank among the Fastest Models on OpenRouter on the first day, but it also topped the Trending global trend list in just two days.

Image source: OpenRouter

It's neither benchmark scores nor media reviews. As a global AI model aggregation platform, OpenRouter has gathered almost all well - known open - source and closed - source models globally, along with a large number of global AI developers and users. Moreover, since the list data comes from the real API calls of developers and users, it has increasingly become the most important "touchstone" for large models in the past year.

Especially the most important Trending global trend list doesn't care about models with "the largest parameters" or "the strongest scores". It only focuses on the actual situations of developers and users' model calls, or rather, which model is more useful and easier to use?

It's not hard to see the high value of Step 3.5 Flash's "world - number - one" ranking this time.

Developers who don't blindly believe in scores only recognize models with "high scores and high capabilities"

On February 2nd, Jieyue Xingchen released the open - source model Step 3.5 Flash, which quickly caught the industry's collective attention. The first reaction was to look at its "intelligence density".

According to benchmark tests, Step 3.5 Flash performed well in mathematical reasoning (scoring 97.3 in AIME 2025) and code repair (reaching 74.4% in SWE - bench Verified). The enhanced version of PaCoRe even improved the AIME 2025 score to nearly a full - mark 99.9.

But for developers, what's more appealing is its "using ingenuity to achieve great results".

Jieyue Xingchen published the technical report of Step 3.5 Flash, elaborately introducing its innovative design in the model structure. Firstly, it adopted the Sparse Mixture of Experts (MoE) architecture. While maintaining a relatively compact total of 196 billion parameters, when processing each Token, it can dynamically select the most suitable "expert". It only needs 11 billion activated parameters to achieve cutting - edge intelligence.

For example, it's like a think - tank with 196 top - notch experts. When receiving a specific code task, the system can instantly and accurately select the most professional 11 people to start working. For developers and users, you pay the time and cost of an 11B model, but get the thinking depth of a 196B model. Its performance is comparable to that of cutting - edge models such as GPT - 5.2 xHigh and Gemini 3 Pro.

Image source: Jieyue Xingchen

At the same time, to solve the bottleneck of long contexts, Step 3.5 Flash also uses a 3:1 sliding window combined with a global attention mixed architecture (SWA + Full Attention) to achieve efficient processing of 256K long contexts, which can greatly save video memory. To some extent, these have solved the problem of the inverted relationship between cost and effect in the Agent era.

But the "ingenuity" of Step 3.5 Flash doesn't stop here. Tests show that Step 3.5 Flash not only supports a generation throughput of 100 - 300 TPS (Tokens per second), but can even reach a maximum of 350 TPS in some scenarios, far exceeding the mainstream level of 50 - 100 TPS last year.

Image source: OpenRouter

The key to achieving this lies in the MTP - 3 (Three - way Multi - Token Prediction) technology.

The reasoning of traditional models is more like "popping beans", thinking one word at a time. MTP - 3 allows the model to predict multiple subsequent Tokens while generating the current content. It's not just a simple speed improvement; it also changes the model's thinking logic to some extent - making the model anticipate the next few steps before speaking.

In the Agent scenario of multi - round tool calls, this "coherence" is even more crucial, greatly reducing the "stuttering" and "memory loss" of the model in the middle of complex logic, and making the originally intermittent AI operations fast and smooth.

But how about its actual performance?

In the actual test of YouTube tech blogger Bijan Bowen, Step 3.5 Flash was able to accurately restore the differences between Swedish design and New York financial styles, and continuously iterate and optimize from fonts, layouts to interaction logics. Even, Step 3.5 Flash also generated a fully functional browser operating system (WebOS), and it was the only model among the multiple models tested by the blogger that could run the classic game "Memory Game" normally.

Image source: Youtube@Bijan Bowen

This ability is also a direct manifestation of the superposition of the model's knowledge capacity, reasoning, and execution abilities.

On the other hand, some netizens on Discord deployed and ran Step 3.5 Flash locally on a Mac (M3 Max) with 128GB of memory. The actual effect far exceeded expectations, and the performance could reach 70% of the hardware's theoretical efficiency. He also pointed out that Step 3.5 Flash not only has a very low model hallucination rate and can output reliable answers and behaviors, but also has a low error rate in scenarios where multiple languages such as Chinese and English are mixed.

Image source: Discord

The more flexible deployment advantage, lower reasoning cost, and most importantly, its power and usability in actual AI usage scenarios have all made the popularity of Step 3.5 Flash a natural result.

Especially on OpenRouter, developers and users have seen many models with "high scores but low capabilities". Compared with scores data and tests divorced from reality, they are most concerned about the actual performance of the model when running in AI applications and systems. In scenarios such as Agents, in - depth research, and automated workflows, the cost of model migration is not low. The collective choice of Step 3.5 Flash by developers and users is sufficient to prove the model's "usability".

On the other hand, the choices of developers and users today are also a key aspect of the Agent era.

Jieyue Xingchen: Building the engine for the Agent era

After the release of Step 3.5 Flash, Zhu Yibo, the CTO of Jieyue Xingchen, mentioned on Zhihu that the team still followed the path of larger parameters and stronger dialogue capabilities during the Step 2 stage, but soon realized that this path was not feasible.

"Different intelligence stages require different base - model structures." After much reflection, he also understood that the basic structure designed for the L1 Chatbot era was not suitable for the L2 Reasoner (reasoning model), and the L3 Agent era needed a new base - model structure even more.

In this context, the training goal of Step 3.5 Flash was initially set to have strong enough logic, truly usable and efficient processing of long contexts, and fast reasoning ability. These directly determine whether the model is usable and easy to use, including the model's error - correction and self - improvement abilities.

Because in the Agent scenario, users no longer focus on the output process, but on the speed, accuracy, and stability of task completion.

When Bijan Bowen was testing AI - generated flight simulations and racing games, he found that although the initial version given by Step 3.5 Flash still had some flaws, after feedback through prompt words, the model was able to iterate and optimize on the original basis, and the quality of game development also improved significantly.

Image source: Youtube@Bijan Bowen

It's not hard to understand the series of choices made in the structural design of Step 3.5 Flash: MoE is used to keep the reasoning cost within a deployable range, MTP - 3 is used to improve the continuous generation efficiency, and a more engineering - oriented solution is adopted for long contexts instead of simply pursuing the theoretical limit. These are not for chasing scores results, but for enabling the model to work continuously in complex multi - round tasks without slowing down, losing memory, or fabricating information.

Chatbots can't do it, so we need Agents.

Behind this is also the shift of the entire industry's focus. In the past, the main battlefield of large models was dialogue. But since 2025, models have begun to introduce large - scale workflows, and Tokens have become more important for developers. Users are no longer satisfied with just Q&A; they hope that AI can directly handle complex tasks - modify large - scale codes and process complex cross - platform processes.

In this case, the choices of front - line developers and users are often more convincing than any benchmark.

The response after the release of Step 3.5 Flash also confirms this. From China to overseas, more developers and users are focusing on the advantages of Step 3.5 Flash, such as its stable operation of Agents, uninterrupted multi - round reasoning, flexible deployment, and low cost. The first place on the OpenRouter Trending global trend list also directly shows the "preference" of developers and users for Step 3.5 Flash.

Conclusion

Since the end of 2022, the explosion of generative AI has proven one thing: large models can change content production, information acquisition, and even the way humans interact with software. From writing, programming to searching and office work, it has entered our daily lives.

But the real arrival of the Agent era has brought about differences. We are more likely to delegate tasks in our life and work to AI for collaboration or even completion. Compared with whether it speaks correctly, we value more the "work - doing" performance of AI, whether it's comparing the prices of Mac Minis (M4) across platforms or modifying large - scale code libraries.

The open - source model Step 3.5 Flash of Jieyue Xingchen has achieved this, so it has succeeded in the real - world arena of OpenRouter and is called by global developers and users.

Ultimately, the success of AI should not depend on how smart it seems, but on how much efficiency it can help us humans improve. In this sense, the popularity of Step 3.5 Flash better illustrates that large models must shed their "show - off" fancy shells and become truly useful productivity tools.