HomeArticle

DeepSeek Update: A Single Sentence Triggers a Collective Surge in Domestic Chip Stocks

科技狐2025-08-25 07:30
DeepSeek V3.1 is released. The hybrid architecture and FP8 technology reduce costs, leading to a sharp rise in the shares of domestic chip companies.

As soon as DeepSeek V3.1 was launched, an official message set off a storm in the entire AI circle.

The new architecture and next-generation domestic chips, in just less than 20 words, were full of information and sparked heated discussions.

After reading many popular science articles in the past two days, Lao Hu simply understands that domestic AI is moving towards the stage of software-hardware collaboration, and in the future, the models are expected to substantially reduce their dependence on foreign computing power such as NVIDIA and AMD.

At the same time, this update also broke the industry curse of "the higher the performance, the more expensive the cost", opening up a huge imagination space for applications in high-computing scenarios such as finance and healthcare.

The reaction of the capital market was also straightforward: as soon as DeepSeek made the official announcement, the concept stocks of domestic chips soared. Meiri Interactive saw a sharp rise in the late trading session and closed with a 13.62% increase.

Some netizens joked that domestic chips have witnessed an epic surge, and with just one sentence from DeepSeek, the Friday stock market directly soared above 3,800 points.

In the past two days, DeepSeek officially launched the V3.1 version without a massive publicity campaign, simply issuing an announcement as low-key as usual.

Lao Hu has sorted out the updates of this V3.1. The most core and revolutionary innovation is its Hybrid Reasoning Architecture.

This architecture can support both the thinking mode and the non-thinking mode, allowing users to switch at any time. They can take their time to analyze or get quick results as they like.

Previously, in DeepSeek's product line, the division of labor was clear: the V3 model was good at general conversations, while the R1 model was more focused on in-depth thinking. The advantage of this separated architecture was that each model could perform well in its own area of expertise, but it was troublesome for users to switch back and forth.

Now, V3.1 has broken down this barrier by integrating multiple core functions such as general conversations, complex reasoning, and professional programming into the same model, making the user experience more flexible and efficient.

Moreover, the reasoning efficiency of V3.1 has also been significantly improved. Official data shows that in the thinking mode, its average performance in various tasks is on par with the previous top R1-0528, but the number of output tokens has been reduced by 20% to 50%. In the non-thinking mode, the output length is also shorter, but the performance remains the same.

Behind this is actually the "chain-of-thought compression" at work: the model learns to generate more concise and efficient reasoning paths during the training phase while ensuring the accuracy of the answers. Simply put, the algorithm has become smarter.

Why do this? It's simple: to save money!

In the past, although the chain of thought could enhance the model's reasoning ability, the lengthy intermediate steps would result in high computing costs and API call fees, making large-scale application difficult.

The chain-of-thought compression in V3.1 has solved this problem, transforming advanced AI reasoning ability from an academic tool into an economical solution for large-scale commercial use.

In community tests, DeepSeek V3.1 scored higher than Claude 4 Opus in the Aider multi-language programming test and at a lower cost.

Now developers are flooding the screens, and the popularity on Hugging Face is rising rapidly.

It's worth mentioning that when DeepSeek officially announced V3.1, it mentioned that this model used the parameter precision of UE8M0 FP8 Scale, and also made significant adjustments to the tokenizer and chat template, so it is significantly different from the previous V3.

Speaking of the "UE8M0 FP8" used in DeepSeek V3.1, Lao Hu will give a simple introduction after some study:

FP8 compresses ordinary floating-point numbers into 8 bits for storage, saving both space and computing power.

Coupled with the "block scaling" idea of MXFP8: dividing the data into blocks and using its own scaling factor for each block, this way, not too much information is lost, and more resources can be saved.

The U, E, and M in the name can be understood as "unsigned + exponent + mantissa". In UE8M0, all 8 bits are used to represent the exponent, with no mantissa or sign bit. This makes it very easy for the processor to restore the data: it only needs to move the exponent bit without performing complex multiplications, resulting in a fast speed and a short path.

Another advantage of this format is its large dynamic range, which can represent both very large and very small numbers at the same time, making it less likely to overflow or be compressed to 0. That is, while ensuring the precision of the 8-bit tensor, the information loss is minimized.

This is particularly suitable for domestic new chips. Previously, most domestic AI chips used FP16/INT8 and could not natively use FP8.

Why is it more suitable for the next-generation domestic chips? Currently, most domestic AI accelerators still use the FP16/INT8 solution and do not have a complete FP8 unit.

The new generation of chips, such as the Moore Threads MUSA 3.1 GPU and the Verisilicon VIP9000 NPU, have begun to support native FP8, and the UE8M0 format of DeepSeek V3.1 is a perfect match for these hardware.

In summary: UE8M0 FP8 allows the model to run on the new generation of domestic chips with less space, faster speed, and more stability while maintaining precision.

This is also why DeepSeek's official Weibo specifically mentioned it, bringing new possibilities of lower cost and high performance to domestic AI.

Let's take a look at some experiences after the update. The official also gave an answer to the question that everyone is concerned about: whether the official website can be directly accessed.

When opening the official website, it can be seen that DeepSeek has changed the "In-depth Thinking (R1)" on the App and web versions to "In-depth Thinking", and the official has confirmed the previous speculation of netizens that the model has been updated.

Let's see what new tricks the omnipotent netizens have come up with.

On Twitter X, an AI blogger analyzed that the ball bouncing effect generated by the new model is more in line with the laws of physics, and parameters such as gravity, friction, rotation speed, and bounce can be adjusted.

Some people even used DeepSeek V3.1 to create a vibration code and instantly became a VJ.

Some netizens even asked V3.1 to draw a self-portrait for them, and the painting style was unexpectedly unique.

However, some users in the community still complained about the translation and writing. The SYSTEM PROMPT requires on-site writing of instructions, and there are occasional mixed Chinese and English and wrong words, which is a bit messy.

Interested Fox Friends can now go to the official website to experience it for themselves~

Lao Hu thinks that every update of DeepSeek makes people look forward to the next one. It has almost become the spiritual totem of domestic AI. Let's look forward to DeepSeek R2 together.

This article is from the WeChat public account "Tech Fox" (ID: kejihutv), author: Lao Hu, published by 36Kr with authorization.