HomeArticle

Where is the way out for domestic large models?

硅基星芒2026-06-16 18:52
The Historic Evolution from "Computing Power Hegemony" to "Architecture Decentralization"

Every time a domestic AI model is released, people always say that domestic models are about to rise and catch up with Anthropic soon. However, reality always gives a slap in the face. The gap between models is not only widening. If you look through various rankings on GitHub, an author with an orange avatar can be seen almost everywhere.

However, whether it's using it actively or having to use it, as AI begins to move out of the laboratory and into the enterprise - level production environment, a profound business reality has emerged: The smartest models are definitely the most expensive ones. Fable and GPT are indeed good, but no one can afford to use them 24 hours a day without interruption. At this moment, people seem to see a glimmer of hope for domestic models.

To truly utilize AI to boost productivity and make the products it produces have viable commercial value, single cutting - edge flagship models are facing a severe ROI test.

Meanwhile, domestic models with slightly inferior capabilities but price advantages urgently need to shed the stereotype of being "toys".

The deeper conflict lies in: Large - model manufacturers are trying to build a closed intelligent agent ecosystem to establish a monopoly, while enterprise users and neutral third - parties are desperately seeking an open and decoupled ecosystem.

Therefore, this article will analyze this complex scenario where technology and business are intertwined from two new engineering paradigms: multi - model dynamic routing (Fusion) and intelligent agent meta - framework (Omnigent), revealing the historical evolution of the AI industry from "computing power hegemony" to "architectural decentralization".

01 The Computing Power Cost Trap and True vs. False Demands

Before discussing how to use international models and domestic models, people should first understand a core premise of AI economics: Tokens are a computing resource whose value is determined by intelligence.

Previously, desktop - based AI agents, which took over users' computers to perform tasks, although the results were disappointing, still revealed a phenomenon: Many individual and enterprise users are in a dilemma of "not knowing how to scale the consumption of tokens to generate value".

Consuming tokens through inefficient task exhaustion with an imperfect underlying structure will inevitably create false demands. The lack of popularity of various agents in the past three months is sufficient to verify this. To make enterprises willing to pay real money, one must not consume computing power just for the sake of it. Instead, one must use the minimum computing power cost to drive the largest task closed - loop.

This is the computing power cost trap that single cutting - edge models currently face and that is in front of everyone.

For complex business tasks such as in - depth industry research and the refactoring of tens of thousands of lines of code, the difficulty shows a typical long - tail distribution.

Among them, only a few links may require a model with extremely high intelligence like Fable 5, while most of the remaining links only need very basic logical capabilities. For tasks like web content scraping, basic code translation, formatted JSON output, and post - inspection and proofreading, there's no need to use a sledgehammer to crack a nut.

If the flagship models of the top three are used to cover all task processes, it's like using a cannon to shoot a mosquito, and the high cost will make any SaaS product trying to be commercialized face bankruptcy in its economic model.

This huge gap between performance and cost is one of the fundamental reasons why current AI applications are difficult to move from the trial period to the "deep - water area". To solve this contradiction, just waiting for cutting - edge models to engage in a price war is probably nonsense. Therefore, a new system engineering approach must be adopted: Allocate tasks according to difficulty and demand.

02 The Fusion Mechanism and the "Asymmetric Competition" of Domestic Models

Where is the way out for domestic models?

This question has attracted the attention of both insiders and outsiders in the AI field.

In response to this sharp question, the traditional answer is often to fine - tune with private data in specific vertical fields, but the effect is not significant because it doesn't touch the essence of the system architecture. A more direct solution at present is to seize the position of "domestic substitution" with extreme cost - effectiveness. This is also the essence of OpenRouter's introduction of Fusion technology as a way to break the deadlock.

Fusion technology, that is, multi - model dynamic routing and synthesis, has a very simple but effective core logic: Distribute a complex problem in parallel to multiple different models, and then let a judging model fuse the results from all parties.

Take an example of the usage method in the programmer circle: Let GPT - 5.5 and Opus 4.8 write the program architecture, and let DeepSeek V4 Pro write the specific code.

Such a simple idea makes people a little skeptical. Can this "trick" really bring a way out for domestic models?

In the DRACO in - depth research benchmark test, a convincing piece of data dispelled the doubts: The "budget - type model group" composed of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro not only defeated the single GPT - 5.5 but also got a score approaching that of the top cutting - edge model combination, and the cost is only 50% of it.

Among the three models in the combination, two are domestic models with obvious performance gaps compared with GPT - 5.5. However, they have provided the most realistic and commercially valuable way out for domestic models: To become the most cost - effective "limbs" and "senses" in a powerful heterogeneous system.

Contrary to the false demands created by various desktop - based agents, in the face of real - money business considerations, the pricing of Anthropic and OpenAI has made "intelligent allocation" a rigid demand for most users and enterprises.

We already know that multi - Agent collaboration is an inevitable trend in AI, and in the enterprise - level Agent architecture, a single powerful model should not fight alone. This is the so - called "Mixed Intelligent Agent Architecture (MoA)", which consists of two parts:

First, the "brain" for scheduling and judging: It occupies less than half of the token share and is composed of the flagship models of Anthropic and OpenAI, responsible for the final consensus extraction, contradiction analysis, and complex reasoning.

Second, the "main force" for execution and work: It occupies more than half of the token share and is composed of domestic or open - source models such as DeepSeek, GLM, and Kimi, responsible for reading a large number of documents, large - scale parallel web searches, and basic code writing.

This is just an ideal situation, and the specific token allocation still varies according to the task difficulty. But importantly, through this "high - low combination", domestic models don't need to compete head - on with the top three in all dimensions, especially in fields such as extreme reasoning that are severely affected by hardware computing power.

As long as they can reach a passing level in fields such as long - text processing, basic code generation, or specific language understanding and maintain a highly competitive API or subscription service pricing, they can occupy an indispensable position in this multi - model routing system and thus obtain a larger number of subscriptions.

In this way, the positioning of domestic models will change: From being a "domestic substitute" for cutting - edge models, they will transform into a "computing power lever" for cutting - edge models.

By integrating into this multi - model collaboration ecosystem, domestic models will officially bid farewell to the score - chasing game on a single test set and, as the underlying gears of the infrastructure, truly enter the production process of global enterprises.

03 Home - Field Advantage and Ecosystem Closure

The on - demand allocation architecture like Fusion is what both enterprise users and individual users dream of, but for technology giants providing large models, this undoubtedly weakens their profits and control.

This leads to another obvious trend in the current industry: The construction of the "home - field advantage" in the era of intelligent agents.

Looking at recent product releases: Abroad, Anthropic and OpenAI are at odds with each other, and Claude Code and Codex are in direct confrontation; at home, Xiaomi first strengthened the binding of MiMo with MiMo Code, and then Zhipu updated ZCode 3.0 specifically for GLM.

This strong binding between the model and the calling environment (IDE/CLI) is not only due to the instinct of commercial exclusivity but also has profound engineering logic and strategic intentions behind it.

From the perspective of engineering logic, this is to cover up model defects with the environment.

The relationship between the model and the intelligent agent environment is like the relationship between a programming language and an IDE. Any general large model has its unique failure modes.

When Anthropic built Claude Code, in addition to developing a command - line tool, it also hard - coded a large number of hidden system prompts, error retry logics, and specific tool call formats optimized specifically for Claude at the bottom.

In an external general intelligent agent framework, Anthropic's model may fail tasks due to unexpected errors such as non - standard output formats; but in its exclusive home field, the IDE or CLI can silently correct these errors in the background. This home - field advantage can make the model perform extremely smoothly in the specified environment, giving users the illusion that "the model is absolutely leading".

From the perspective of strategic intentions, this is to establish an inescapable supplier "lock - in".

From Prompts, to Skills, to Harness, all fully illustrate the importance of memory and environment. Once users get used to working in a specific intelligent agent framework, a large amount of accumulated context, custom configurations, and workflows will make them unable to leave the underlying model easily.

A simple API price war can only solve problems for a while, while a well - polished closed intelligent agent environment means that the model's capabilities can be upgraded to a product experience.

This is the secret to Anthropic's success: When the core business processes of programmers in an enterprise are solidified in a specific intelligent agent, even if OpenAI launches a new model that makes Altman "collapse at the sight of an atomic bomb", or DeepSeek and Xiaomi launch models that are ten or even a hundred times cheaper, the enterprise cannot switch with one click because the workflows are incompatible.

This closed - island strategy is the strongest moat for giants to resist the impact of multi - model routing technologies like Fusion and open - source alternatives.

04 The Rise of the Meta - Framework and the Counterattack of Third - Parties

The giants still have the ability to deal with open - source technologies, but the trend of multi - Agent collaboration is ultimately irresistible. When enterprises find themselves forced to copy and paste between several incompatible intelligent agent islands and have to bear high costs because they cannot switch the underlying models, a revolution at the infrastructure layer will inevitably break out.

This is the historical background for Databricks to open - source Omnigent. Databricks positions Omnigent as a "meta - framework (Meta - Harness)", an abstraction layer at a higher dimension than a single intelligent agent.

Looking back at the history of computer science, the biggest leaps often come from new abstraction layers. When engineers were struggling to manage dozens of different servers at the same time, Google developed Kubernetes, which abstracted the underlying hardware into a unified resource pool. Now, the AI industry is at exactly the same node, and each intelligent agent and its framework (Harness) are those servers that are difficult to fully integrate.

The core value of Omnigent lies in depriving the giants of their home - field advantage and returning control to users. By building a unified API, it achieves three subversive functions:

First, combinability similar to "one - click hot - swapping".

Users can switch the node responsible for logic from Claude to other custom models with just one line of code in a unified workflow, or call Codex and multiple self - built intelligent agents simultaneously in a project, directly breaking the giants' vendor lock - in strategy.

Second, absolute strategy control that balances security and cost.

In a closed ecosystem, whether a model can be used, how it can be used, and for how long are completely defined by the giants' black boxes. But in the meta - framework, users can freely set hard limits. For example, when the token consumption of a certain session reaches $100, it will be immediately frozen and a manual confirmation will be requested, without having to query the consumption at each AI supplier.

Since the control layer has moved up to the meta - framework, even if different models are used at the bottom, the security review and cost strategies that enterprise users value most can be uniformly implemented.

Finally, elimination of context islands.

The session state is no longer stored on the servers of a single vendor but is taken over by a neutral meta - framework. Whether it's human - machine collaboration or multi - Agent collaboration, there will be a unified workbench.

Therefore, both the Fusion technology and the Omnigent framework must and can only come from third - parties.

As mentioned before, Anthropic, OpenAI, and a number of domestic AI manufacturers have a serious capital - oriented bias. As long as their own models are not completely useless, they will never launch a framework to allow enterprise and individual users to seamlessly distribute tasks to competitors to save costs.

Fusion was born at OpenRouter, a neutral model aggregation API platform; Omnigent was born at Databricks, an underlying infrastructure provider with a core strategy of "data multi - cloud neutrality". Only third - parties that are completely decoupled from specific models have the motivation to create such barrier - breaking tools.

This represents the core interests of the vast number of enterprise developers: AI should be a commodifiable and substitutable computing resource, not a privilege controlled by giants.

05 Remodeling the Value Chain of AI Intelligent Agents

In the past three years, people around the world have been in the "model - centric" stage, and everyone has been looking for an all - knowing and all - powerful god that can solve all problems.

But reality has told us that Fable 5 can't do it, GPT - 5.5 can't do it, and DeepSeek V4 Pro can't do it either. We can only enter the "architecture - centric" stage.

In this new stage, the closed - loop play of a single model or a single intelligent agent is doomed to be marginalized. And the future enterprise - level AI productivity system will surely present a highly differentiated hierarchical structure:

At the bottom layer, that is, the computing power execution layer, domestic models will undertake a large number of basic "brick - moving" tasks with extreme cost - effectiveness, completely getting rid of the fate of being toys and becoming an indispensable cornerstone.

In the middle layer, that is, the cognitive judgment layer, the flagship models of the top three will take a back seat and no longer deal with trivial details. Instead, they will act as high - level engineers in charge of the overall situation and, under a dynamic routing mechanism like Fusion, be responsible for the most difficult core convergence tasks.

At the top