HomeArticle

After Zhipu and Minimax showed their "killer moves", DeepSeek made a basic attack.

锦缎2026-02-13 08:24
The decisive moment for domestic AI is getting closer.

Who could have imagined that within just one night, the three major domestic AI giants successively released their new models?

DeepSeek, Zhipu, and MiniMax seamlessly staged this exciting show, keeping AI enthusiasts busy during this Spring Festival.

In the current situation of scarce computing power and increasing homogenization, domestic large models are gradually taking different paths:

Some are betting on the memory boundary of ultra-long texts, some are tackling the engineering implementation of agents, and some are choosing to enter the enterprise-level market with lightweight and efficient models.

01 DeepSeek: Defining the Boundary of Long Text Processing with a Million-Level Context

First, DeepSeek, which had been quiet on the product side for a long time but was highly anticipated globally, quietly launched a gray-scale test of its new model on its official website and mobile app.

Although the official has not released the formal technical documentation, the community generally speculates that this model may be the upcoming DeepSeek-V4-Lite version.

According to the currently circulating information, the parameter scale of this model may be only about 200B, and it does not use the Engram conditional memory mechanism jointly developed by DeepSeek and Peking University.

However, we can still find the core breakthrough of the new version in a simple actual test: An ultra-long context window of 1 million (1M) tokens.

This parameter far exceeds the 32K - 128K limit of previous versions and some mainstream domestic large models. It can process the text volume equivalent to 500 pages of A4 documents in a single interaction, and can handle scenarios frequently applied in daily life such as long document analysis and cross-chapter reasoning.

Empirical Test of the Ultra-Long Context: Searching for a Needle in a Haystack

The "searching for a needle in a haystack" test is one of the industry-standard methods for evaluating long text capabilities in the AI field. By randomly inserting specific information into an ultra-long text and requiring the model to accurately locate and answer related questions, it tests the actual effectiveness of the context window.

According to the test results from the technical community, DeepSeek's new model can still maintain an accuracy rate of over 60% at a length of 1 million tokens. The accuracy curve is almost horizontal within 200,000 tokens and only starts to decline gently after that, which is better than the Gemini series models tested during the same period.

If the above test results are true and reliable, it indicates that DeepSeek's new model not only supports a million-level context but also has a relatively high level of effective context utilization rate. The model can truly understand and utilize the information in the ultra-long text, rather than just receiving information at the technical level.

In the technical community, the results of a tester further corroborated this powerful ability.

The tester uploaded 30 Markdown files of a self-created worldview setting collection to DeepSeek at once, about 570,000 bytes, equivalent to 190,000 - 285,000 tokens. Then, the tester asked five types of detailed questions involving character backgrounds, item sources, stronghold descriptions, etc.

The model can accurately locate sparse information and restore the context. Even characters with extremely low frequencies are never missed. Therefore, in the actual processing of documents at the 200,000-token level, DeepSeek's new model has demonstrated a reliable fine-grained information retrieval ability.

Capability Boundary: Focusing on the Text Track

In the classic "pelican riding a bicycle" test, the vector graphics output by DeepSeek showed structural chaos and geometric distortion.

This test requires the model to generate SVG image codes for rare combination scenarios without any prior samples, testing the model's precise control ability over structured languages.

The results show that the model has limitations in code generation tasks involving geometric coordinates and spatial relationships.

This result is directly related to DeepSeek's technical positioning and is not surprising: like the previous version, the new model continues the positioning of a pure text model. The R & D focus is on text modeling and information compression with a million-token-level context, rather than cross-modal visual structure reasoning or precise code generation.

In fact, under the constraint of limited computing resources, giving up the optimization of structured graphic languages such as SVG and strengthening the long text processing ability is in line with the development direction of domestic AI, which emphasizes "application", and helps to form a differentiated technical path.

The deficiencies shown by DeepSeek's new model in this test are not capability defects but an inevitable trade-off in resource allocation.

Finally, according to relevant information circulating in the technical community and social media platforms, DeepSeek may be training an ultra-large model with a parameter scale exceeding 1T. Although it is unlikely to be released in February, the multi-modal function may be implemented.

02 Zhipu: Agent Engineering and the Reality of Scarce Computing Power

If the lightweight model released by DeepSeek is like a basic attack, then Zhipu, which subsequently released GLM - 5, truly launched a powerful move.

The release of GLM - 5 is actually not unexpected. The appearance of pony - alpha a few days ago and the preview of the technical architecture (Details of the GLM - 5 architecture surfaced: DeepSeek is still an unavoidable threshold) all indicate that Zhipu is ready to launch a new product.

However, there is a very interesting point in the official release announcement: Zhipu has shifted its technical narrative from "Vibe Coding" to "Agentic Engineering".

In terms of meaning, this change indicates that Zhipu's large model capabilities are starting to shift: From generating code snippets and front - end demos in the past to completing end - to - end complex systematic engineering tasks.

Next, let's take a look at the actual capabilities of GLM - 5.

A Leap in Reliability

First, let's look at the evaluation list of Artificial Analysis:

An open - source model ranked 4th globally in intelligence, 6th globally in programming ability, and 3rd globally in agency ability!

To be honest, I was a bit shocked when I first saw this list.

This is the first time I've seen a domestic model rank so high on the list with its all - around powerful capabilities, and the gap with world - class closed - source models such as Gemini, GPT, and Claude is only marginal, which proves that Zhipu's grand technical narrative is not just empty talk.

According to the data released by the official, the total parameter scale of GLM - 5 is 744B, and the activated parameters are 40B. Compared with the previous model GLM - 4.7, the parameter scale has more than doubled, and the pre - trained data has also increased from 23T to 28.5T.

The Scaling Law is still in effect. More parameters and data provide a more solid semantic foundation for GLM - 5 in complex task processing.

Technically, it is basically consistent with the previous analysis. The model integrates DeepSeek's sparse attention mechanism (DSA) for the first time, pursuing higher efficiency while maintaining the long text processing effect and significantly reducing the deployment cost.

At the same time, GLM - 5 also introduces the self - developed Slime asynchronous reinforcement learning framework, enabling the model to continuously learn knowledge in long - term interaction with users and improving the coherence and stability of task planning. However, Zhipu has not released a paper on this technology, and further interpretation will be made after its release.

The more critical technical breakthrough lies in the leap of reliability indicators: In the AA - Omniscience hallucination rate test, GLM - 5 directly compressed the hallucination rate from 90% of the previous version GLM - 4.7 to 34%, breaking the record of Claude 4.5 Sonnet and topping the list.

A model that frequently generates hallucinations cannot be competent for complex systematic tasks. GLM - 5 is significantly more cautious when generating factual content, greatly reducing the risk of fabricating information that users dislike the most, which also provides the necessary guarantee for Zhipu's claim of "agentic engineering" implementation.

Testing of Programming and Agency Abilities

In terms of programming ability and agency ability, GLM - 5 achieved high scores in mainstream benchmark tests such as SWE - bench Verified and Terminal - Bench 2.0, reaching the leading level among open - source models.

According to the internal test results, when performing front - end construction tasks, the success rate of GLM - 5 is as high as 98%. In the scenarios of back - end reconstruction and task planning, the success rate has also increased by more than 20% compared with the previous version GLM - 4.7, and the actual use experience is close to that of Claude Opus 4.5.

GLM - 5 can independently disassemble user requirements and coordinate multiple toolchains, thus properly handling dependency relationships and completing end - to - end task delivery. For example, after the user inputs natural language requirements, the model can directly generate a deployable horizontal puzzle game and a paper retrieval application.

In the Vending Bench 2 simulated business operation test, the agent built by the model to operate vending machines earned $4432 within one year, demonstrating the ability to control resource allocation, market fluctuations, and long - term goal consistency.

The capabilities demonstrated by GLM - 5 all point to the core requirements of agentic engineering: The model must maintain logical coherence and execution stability in multi - step, cross - tool, and long - time - span tasks.

Generous Open - Source and the Reality of Lack of Computing Power

The powerful performance of GLM - 5 is obvious to all. More importantly, Zhipu chose to fully open - source GLM - 5 under the MIT License agreement, simultaneously releasing it on Hugging Face and ModelScope. Later, it was also connected to the domestic version of TRAE and Ollama, directly "removing" the usage threshold for developers.

At the same time, as the "pride of domestic large models", the model is deeply compatible with domestic chip platforms such as Huawei Ascend, Moore Threads, and Cambricon. By optimizing the underlying operators, the inference performance is improved, adding a major pillar to the domestic computing power ecosystem.

However, such a generous open - source initiative forms a sharp contrast with the resource shortage on the commercial side.

However, along with the release of GLM - 5, there is also news that the price of Zhipu's GLM Coding Plan has been increased: the price of the packages has increased by more than 30%, the half - price discount for the first purchase has been cancelled, and a weekly quota limit has been added.

The previous quarterly prices of the three subscription services were:

  • GLM Coding Lite: $60 per quarter
  • GLM Coding Pro: $300 per quarter
  • GLM Coding Max: $600 per quarter

More importantly, the official previously announced that GLM - 5 is only available to MAX package users at the commercial API level. Pro package users can use GLM - 5 after the model resource replacement is completed, while there is no clear statement for Lite package users.

After receiving a large amount of feedback on this issue, Zhipu's staff quickly replied on the technical community and platform, admitting that the computing power resources are extremely tight. Problems such as "insufficient concurrency for one month" and "unable to meet the demand even after a 20 - day purchase limit" have not been solved. Pro package users can use GLM