StartseiteArtikel

Wo liegt der Weg nach vorne für die inländischen großen KI-Modelle?

硅基星芒2026-06-16 18:52
Der historische Wandel von der „Hegemonie der Rechenleistung“ zur „Dezentralisierung der Architektur“

Whenever a Chinese AI model is released, people always say that Chinese models will soon rise and it is possible to catch up with Anthropic in the foreseeable future. However, reality repeatedly proves these predictions wrong. The gap between the models is not only getting wider. If you look at the various rankings on GitHub, an author with an orange avatar can be found almost everywhere.

Whether you use it voluntarily or out of necessity, with the development of AI from the laboratory to the business production environment, a profound business reality emerges: The smartest model is always the most expensive model. Although Fable and GPT are good, no one can afford to use them around the clock. In this situation, people seem to see a chance for Chinese models again.

In order to truly utilize AI to improve productivity and endow the resulting products with implementable commercial value, the single leading flagship model is facing a severe ROI review.

At the same time, Chinese models, which are slightly less powerful but more affordable, urgently need to get rid of the rigid label of "toy".

The deeper conflict lies in: Large model providers try to build a closed agent ecosystem to establish a monopoly, while enterprise users and neutral third - parties are desperately striving to open up and decouple the ecosystem.

Therefore, this article will analyze the complex picture in which technology and business are intertwined based on the two new engineering paradigms of "Multi - Model Dynamic Routing (Fusion)" and "Agent Metaframework (Omnigent)" and reveal to the public the historical development of the AI industry from "computing power hegemony" to "architecture decentralization".

01 The Trap of Computing Power Costs and Real or False Demands

Before discussing how to correctly use international and Chinese models, one should first understand a central premise of the AI economy: Tokens are a computing resource, and their value is determined by intelligence.

The previously emerged desktop AI agents, which take over users' computers to perform tasks, have shown disappointing results, but they have also uncovered a phenomenon: Many private and enterprise users are in the difficulty of "not knowing how to use tokens on a large scale to create value".

Consuming tokens by inefficiently trying out tasks with an incomplete framework will inevitably lead to false demands. This is sufficiently proven by the silence of various agents in the past three months. To make enterprises pay with real money, one cannot just consume computing power for no reason, but must achieve the greatest task completion with minimal computing power costs.

This is the trap of computing power costs that everyone is facing and that the current single leading model has to deal with.

In complex business tasks such as in - depth industry studies or the restructuring of thousands of lines of code, the difficulty presents itself in a typical long - tail distribution.

It may only need a model like Fable 5 with extremely high intelligence in very few steps, while in most other steps, only very basic logical abilities are required. For tasks such as crawling website content, translating basic code, formatting JSON outputs, and double - checking, it is like using a cannon to shoot a sparrow.

If one uses the flagship models of the "Big Three" for all task processes, it is like using a cannon to shoot a mosquito, and the high costs would bankrupt any attempted commercial SaaS product in economic modeling.

This huge gap between performance and cost is one of the main reasons why current AI applications have difficulty moving from the testing phase to the "deep waters". To solve this problem, it is probably meaningless to just wait for the leading models to start a price war. Therefore, one must pursue a new approach to system technology: Assign tasks according to difficulty and demand.

02 The Fusion Mechanism and the "Asymmetric Competition" of Chinese Models

Where is the way out for Chinese models?

This question interests both people in the AI industry and outsiders.

The traditional answer to this sharp question is often to fine - tune in certain vertical areas with private data, but the result is not very fruitful because it does not target the essence of the system architecture. The currently more direct solution is to take the position of the "Chinese alternative" with extreme cost - effectiveness. This is also the essence of the Fusion technology introduced by OpenRouter as a solution.

The Fusion technology, that is, the dynamic routing and synthesis of multiple models, is based on a very simple but effective core principle: A complex question is distributed in parallel to several different models, and then an evaluation model will merge the results of all parties.

Take a usage method from the programmer community as an example: Let GPT - 5.5 and Opus 4.8 write the program architecture and DeepSeek V4 Pro write the specific code.

Such a simple approach makes one a little skeptical. Can this "trick" really open up a way out for Chinese models?

In the DRACO in - depth investigation benchmark tests, a convincing result has dispelled the doubts: A "budget model group" consisting of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro has not only beaten the single GPT - 5.5 but also nearly reached the score of the top models, and the costs are only 50% of that.

Two of the three models in the group are Chinese models, whose performance lags far behind GPT - 5.5. Nevertheless, they offer the most realistic and commercially valuable way out for Chinese models: To become the most cost - effective "limbs" and "sensory organs" in a highly heterogeneous system.

In contrast to the false demands created by various desktop agents, "intelligent assignment" has become an urgent need for most users and enterprises due to the prices of Anthropic and OpenAI.

We already know that the cooperation of multiple agents is the inevitable trend of AI. In the enterprise agent architecture, a single powerful model should not act alone. This is the so - called "Mixed Agent Architecture (MoA)", which consists of two parts:

First, the "brain" for planning and evaluation: It accounts for less than half of the token shares and is taken over by the flagship models of Anthropic and OpenAI. It is responsible for the final consensus, the analysis of contradictions, and complex inferences.

Second, the "main force" for execution and work: It accounts for more than half of the token shares and is taken over by Chinese or open - source models such as DeepSeek, GLM, and Kimi. It is responsible for reading mass documents, parallel searching on a large number of websites, and writing basic code.

This is only an ideal state. The specific token assignment still depends on the difficulty of the task. However, it is important that through this "combination of high - and low - performance models", Chinese models do not have to compete with the "Big Three" in all dimensions, especially in areas such as extreme inference, which are strongly influenced by hardware computing power.

As long as Chinese models can achieve acceptable performance in areas such as long - text processing, basic code generation, or understanding certain languages and maintain competitive API or subscription service prices, they can occupy an indispensable position in this multi - model routing system and thus gain a larger number of subscribers.

In this way, the position of Chinese models changes: From a "Chinese alternative" to the leading models, it becomes a "computing power lever" for the leading models.

By integrating into this multi - model cooperation ecosystem, Chinese models officially leave the point system on a single test set and actually enter the production processes of global enterprises as basic building blocks of the infrastructure.

03 The Home - Field Advantage and the Closed Ecosystem

An architecture that distributes according to demand, such as Fusion, is a dream for enterprise and private users, but for the technology giants that provide large models, this undoubtedly means a weakening of their profits and control.

This leads to another obvious trend in the industry: Building the "home - field advantage" in the agent era.

Let's look at the recent product releases: Abroad, Anthropic and OpenAI are facing off, and Claude Code and Codex are in competition. In China, Xiaomi first strongly bound MiMo Code to MiMo, and then Zhipu updated ZCode 3.0 and focused on GLM.

This strong binding between the model and the calling environment (IDE/CLI) is not only based on the instinct of commercial exclusivity but also on a deep engineering logic and a strategic intention.

From the perspective of engineering logic, it covers up the shortcomings of the model through the environment.

The relationship between the model and the agent environment is like the relationship between a programming language and an IDE. Every general large model has its own failure modes.

When Anthropic develops Claude Code, it not only has to develop a command - line tool but also hard - code a huge number of hidden system hints, error - repetition logics, and specific tool - calling formats optimized specifically for Claude in the framework.

In an external general agent framework, the Anthropic model could lead to a task failure due to non - standardized output formats and other random errors. But in its own home environment, the IDE or the CLI can silently correct these errors in the background. This home - field advantage allows the model to function extremely smoothly in the specified environment and gives users the impression that the model is "absolutely leading".

From the perspective of strategic intention, it creates a "provider lock - in" that is difficult to break.

From prompts to skills to harness, the importance of memory and the environment becomes obvious. Once users get used to working in a certain agent framework, the large amounts of collected contexts, customized configurations, and work processes will make it impossible for them to easily leave the underlying model.

A pure API price war can only solve short - term problems, while a perfectly designed closed agent environment means that the model capabilities can be transformed into a product experience.

This is the secret of Anthropic's success: When the core work process of programmers in an enterprise is set in a certain agent, the enterprise cannot simply switch to a new model from OpenAI, even if it is a model that makes Altman "freeze like in front of an atomic bomb", or to a model from DeepSeek or Xiaomi that is ten or a hundred times cheaper, because the work processes are not compatible.

This closed - island strategy is the strongest protective wall for the giants against the Fusion multi - model routing technology and the influence of open - source alternative models.

04 The Rise of the Metaframework and the Counter - Attack of Third Parties

The giants still have the power to deal with open - source technology, but the trend of multiple - agent cooperation is irresistible. When enterprises find that they are forced to copy and paste between several incompatible agent islands and have to bear high costs due to the inability to switch the underlying model, a revolution at the infrastructure layer will inevitably spread.

This is the historical background for the open - source release of Omnigent by Databricks. Databricks has positioned Omnigent as a "Metaframework (Meta - Harness)", a more abstract level than a single agent.

If we look at the history of computer science, the biggest leap often comes from a new abstract level. When engineers had difficulty managing dozens of different servers at the same time, Google's Kubernetes emerged and abstracted the underlying hardware into a unified resource pool. The AI industry is now in exactly the same situation, and the various agents and their frameworks (Harness) are like the servers that are hardly fully compatible.