The mystery is solved. Pony Alpha is made in China. Xiaolong makes a comeback in AI Coding.
In the past two days, a model named Pony Alpha has become the hottest name in the AI circle overnight, thanks to its outstanding performance in coding capabilities.
OpenRouter does not indicate the development team of Pony Alpha. However, according to multiple sources from Zimu AI, this model is the work of one of the so - called "Six AI Upstarts" and is derived from the company's upcoming new - generation model.
In terms of coding capabilities, Pony Alpha has demonstrated its excellent planning and invocation abilities in the AI Agent scenario. In some public evaluations, it automatically completed the process of setting up an RPG game project with just two - stage interactions.
Coincidentally, a few weeks ago, another company among the "Six Upstarts", Yuezhi Anmian, also made a key update to its Kimi 2.5. It emphasizes code understanding, modification, and collaboration in long - context scenarios, bringing coding to an engineering - level application.
In the past two years, AI Coding has always been regarded as one of the most certain commercialization directions for large models. GitHub Copilot has accumulated over 20 million users and is widely adopted by many enterprises. It is one of the largest paid AI products in scale at present.
As the most - watched AI programming product currently, Claude Code achieved an annualized revenue of approximately $1 billion within six months after its launch, demonstrating that AI Coding already has real commercial revenue - generating capabilities.
The developer usage rate also gives positive feedback. A Stack Overflow survey in 2025 showed that over 80% of developers have used AI tools in their work, with programming - related uses having the highest proportion.
Against this background, the upstarts' attempts to make a comeback in AI Coding essentially aim to find a path that represents advanced productivity and can ensure stable monetization in the AGI race against leading enterprises.
01
The Red - Envelope Battle Isn't Over, and the Coding Battle Has Begun
During this Spring Festival season, the general public's direct perception of AI mostly comes from red envelopes. Tech giants such as Yuanbao, Qianwen, and Baidu have successively distributed large - scale red envelopes, competing fiercely to seize the native AI entry point.
However, in the model market, another "Spring Festival season" is quietly unfolding. In the past week, OpenAI and Anthropic almost simultaneously turned "Coding" into product - level initiatives: OpenAI launched the desktop version of Codex, emphasizing multi - agent long - term tasks; Anthropic released Opus 4.6 and enhanced Claude Code.
Different from traditional code - completion tools, Claude Code is designed as an engineering - type Agent that can directly read code repositories, invoke terminals, and execute test processes. It supports closed - loop operations including task decomposition, command execution, and result verification, which is closer to the working mode of real developers.
The core of this change lies in whether the model has the ability of task autonomy. Against this background, the emergence of Kimi 2.5 and Pony Alpha is an important step for domestic models on the coding stage.
Let's first look at Kimi 2.5. According to the official documentation, Kimi 2.5 introduces the so - called "Agent Swarm" architecture, which can spontaneously create up to about 100 sub - agents to handle different sub - problems in tasks in parallel.
This design can achieve multi - path parallel execution and tool invocation when dealing with complex workflows that require multi - step collaboration.
In this process, a coding task is no longer completed by a single model but is decomposed into multiple sub - tasks and handled in parallel by different Agents. This parallelism is not about parallel generation but about separation of duties.
In the official example, one can see that a complete front - end interface is generated from a simple natural - language prompt, and interactive effects are achieved.
The Agent Swarm does not require pre - defined sub - agents or workflows. When receiving a complex task, it will automatically assign sub - Agents responsible for "searching, debugging, writing, and verifying" and advance in parallel. Compared with the serial execution of traditional single - Agent models, this approach can significantly reduce the task completion time.
This multi - Agent scheduling method of "separation of duties + state sharing" focuses not on the generation speed but on reducing the risk of context conflicts and logical rollbacks in complex tasks, making it more suitable for long - term engineering - level processes.
As for Pony Alpha, the model that has gained popularity on OpenRouter has no official white paper. However, the public model description and community tests show that it performs prominently in long - term task planning and engineering - level output.
OpenRouter shows that Pony Alpha has a relatively large context window (about 200K tokens). In multiple test cases, users' test tasks assigned to Pony Alpha were successfully completed, mostly involving one - time generation of complete data visualization, algorithm implementation, and front - end display segments.
In the scenario of setting up a game architecture, Pony Alpha can complete numerical calculations, state maintenance, and visualization presentation in a single generation, and will not damage the existing structure under subsequent modification instructions.
According to community test cases, a developer used Pony Alpha in conjunction with Claude Code to run a Minecraft project, generating about 170KB of pure JavaScript code in about 2 hours, and the output quality was evaluated as "beyond expectations".
Another test pointed out that the model demonstrated "aesthetic and completion levels close to those of Claude Opus 4.5" in detailed tasks such as SVG generation.
Obviously, when it comes to the issue of iterating coding capabilities, Pony Alpha, Kimi 2.5, and their American counterparts such as Claude are all targeting the same pain point: how to complete "engineering - level" complex tasks.
That's why AI Coding is considered one of the most commercially promising directions at present. Different from traditional chatbots, the Agentic workflow requires the model to perform multiple rounds of tool invocations, long - context memory, and complex task planning, which will lead to an exponential increase in the token consumption of a single interaction.
Stable and continuous productivity output is the evolution direction of AI Coding urgently needed in the B - end scenario.
In this sense, "The upstarts' comeback in AI Coding" is not just a slogan at the technical level but a practical choice:
Tech giants can choose to use red envelopes and financial power to establish the dominance of their models. However, for startups, both dominance and commercialization need to be achieved through the model itself.
In other words, in the field of domestic AI Coding in 2026, it might be the upstarts that take the lead.
02
Domestic AI Coding might Rely on the Upstarts
Yao Shunyu, the chief scientist of Tencent's CEO, once made a judgment: In the field of AI Coding, only the best or the most expensive models will be subscribed to in the long term.
Now, the meaning of this statement is becoming more and more specific.
In the past year, Chinese Internet giants have not slackened their investment in the AI Coding direction. For example, Baidu's "Wenxin Kuaima" is positioned as an enterprise - level intelligent programming assistant.
On the Alibaba side, based on the AI capabilities of its large - model family Qwen, it launched Qwen3 - Coder in 2025, which focuses on code generation and engineering tasks and can compete with international mainstream models in some coding scenarios.
ByteDance integrates large models deeply with IDEs and editors through developer tools such as Trae, supporting cross - platform coding assistance and debugging work.
The obvious common feature of these products from tech giants is that they are deeply integrated with their own large - model systems and are designed for the complex processes of internal engineering and enterprise - level users.
They often emphasize enterprise requirements such as standardization, security, and private deployment, and improve engineering efficiency through linkage with IDEs and cloud - service platforms. They are not necessarily packaged directly into standardized products available for external subscription.
This approach reflects the strategic logic of tech giants: For them, AI Coding is first and foremost an infrastructure for improving internal efficiency and business collaboration, rather than an independent track for short - term commercialization competition.
They have a large internal code library, a mature engineering system, and a large number of engineer usage scenarios. Therefore, they prioritize internalizing their capabilities and embedding them in the existing R & D process, rather than pursuing immediate large - scale output to validate the external market.
In contrast, the product positioning of Kimi 2.5 and Pony Alpha has been more inclined from the start to Agent - based capabilities that can be demonstrated externally and replicated on a large scale.
The difference behind this is not about the superiority or inferiority of capabilities but about different goals and incentive mechanisms: Tech giants prioritize solving efficiency and security problems within their own engineering boundaries, while some "upstarts" try to turn Agent - based capabilities into a product form that can be verified, subscribed to, and operated on a large scale externally.
In other words, AI startups have no in - house resources, and all technological iterations are aimed at opening up the market.
Without the support of advertising, e - commerce, or cloud services, if they still choose to adhere to the self - developed base - model route, commercialization is no longer just "icing on the cake" but a prerequisite for training the next - generation model.
Compared with general - purpose dialogue or content generation, AI Coding is one of the few application directions where users' willingness to pay is clear, the logic of repeat purchases is established, and the pricing anchor is high enough.
This is why those who are most eager to make their products successful in the field of AI Coding mostly come from non - tech - giant camps.
Take Anthropic as an example. Its Claude model has not become a consumer - level hit like ChatGPT, but it has established a strong reputation among developers and enterprise users.
Anthropic continuously strengthens the long - context stability, tool - invocation consistency, and constraint - following ability of its model. Its goal is not a one - time generation effect but to reduce the error rate and rework cost of the model in real - world engineering processes.
Once these capabilities are embedded in real - world workflows, they are extremely difficult to replace. That's why Anthropic can compete with OpenAI in a parallel way in the professional development scenario.
This path is also of reference value to domestic AI startups.
The recent enhancement of Kimi 2.5's coding and complex - task - handling capabilities, as well as AI programming tools like Pony Alpha that are more engineering - oriented, send a signal not about "the model has been upgraded" but about a change in the product logic, shifting from "being able to write code" to "being able to participate in development".
Participating in development means entering the complete chain of requirement decomposition, code understanding, modification, review, and even continuous iteration. Only by succeeding in complex scenarios can there be a real basis for B - end repeat purchases and long - term payments.
Therefore, the competition in AI Coding is not just a battle of technical routes but a differentiation of survival strategies.
Tech giants can take their time, internalize their capabilities, and not rush to monetize. However, for startups adhering to the base - model route, whoever can gain a foothold in the coding scenario first is more likely to win the qualification to place bets in the next round.
Currently, AI Coding is more like a quiet but realistic commercial endurance race.
On this track, some domestic "upstart" players are already showing signs of making a comeback.
This article is from the WeChat public account "Zimu AI". The author is Li Zhaofeng, and the editor is Wang Jing. It is published by 36Kr with permission.