Gemini 3 Qianwen Version: Now Available!

The first domestic version of Gemini 3

The first domestic version of Gemini 3 is here.

On January 26th, Alibaba officially released the Qianwen flagship inference model, Qwen3-Max-Thinking.

According to the introduction, Qwen3-Max-Thinking has a total parameter count of over one trillion and a pre-training data volume of up to 36T Tokens. It has refreshed global records in multiple authoritative evaluations such as scientific knowledge (GPQA Diamond), mathematical reasoning (IMO-AnswerBench), and code programming (LiveCodeBench). It achieved the first double full marks in China on the mathematical reasoning tests AIME 25 and HMMT 25. Even in the "last test for humans," HLE, it scored 58.3, significantly exceeding GPT-5.2-Thinking's 45.5 and Gemini 3 Pro's 45.8.

What's even more crucial is the timing. If you've been following the AI circle recently, you'll notice that all major model manufacturers are preparing big moves. Alibaba released Qwen3-Max-Thinking right at this juncture, clearly aiming to claim the title of "the first domestic Gemini 3."

No matter how good the ranking data looks, can it really rival Gemini 3?

I tried Qwen to generate code several times. In the first few attempts, the failure rate was quite high. But when it comes to scenarios that Alibaba excels in, its performance is completely different. For example, when asked to create an e-commerce website for selling fruits, it could write out functions like product classification, adding to the shopping cart, and settling the bill in one go, with complete logic and a smooth experience. Obviously, it has seen too many e-commerce scenarios, and has been well-fed with data from Taobao and Tmall, so it can handle such tasks with ease.

However, for other types of tasks, the success rate is not very stable. If your requirements fall within its comfort zone, the experience will be relatively good. If not, you may need to try several times and adjust the prompts.

I also specifically tested a more complex interactive case: a balloon shooting game with somatosensory control using a camera, which is also a classic demo demonstrated by Gemini 3. The specific requirements were: use gestures to control the crosshair on the screen, perform a pinching action (bringing the thumb and index finger together) to shoot balloons floating upwards from the bottom, and include details such as a sky background, drifting clouds, hitting effects, and combo feedback.

Qianwen's performance surprised me a bit. It built the entire framework of the game in one go: a gradient sky background, balloons generated from the bottom and floating upwards at different sizes and speeds, and the UI displaying the score and combo count. All these basic logics were correct.

The interactive effects were quite interesting. When you extend your index finger, the crosshair on the screen will follow your hand. Pinching your thumb and index finger together fires the shot. At the moment of hitting a balloon, the screen will vibrate slightly, there will be particle effects when the balloon bursts, and there will be a "pop" sound effect, providing a strong sense of feedback. A combo number will be displayed for consecutive hits, and this kind of instant feedback is really immersive.

But there was an obvious problem when actually playing the game: poor aiming. Even though your finger was clearly pointed at a balloon, the crosshair was always off. It took several shots to hit a balloon by chance. This is probably due to a deviation between hand tracking and screen coordinate mapping, or the calibration algorithm is not precise enough. Although Qwen completed the entire somatosensory control process: camera invocation, gesture recognition, and shooting feedback all worked, but it failed to achieve the core accuracy of "hitting where you point," which reduced the game's playability.

However, the most remarkable thing about Qianwen this time is not the size of its parameters, but the way it "thinks." In the key improvement of model inference ability, Qianwen's new model adopts a brand - new Test - time Scaling mechanism, which not only improves inference performance but also is more cost - effective.

For example, the previous AI approach to solving math problems was like this: write 10 answers at the same time, then vote to see which one has the most votes and choose that one. This method is stupid, wastes computing power, and all 10 answers may make the same mistake.

Qwen3 has changed to the human way: do it once, then check where it went wrong, summarize the experience, and then do it again. Just like when you use a wrong - question notebook, you'll definitely do better the second time. As a result, in the test that requires using tools to solve problems, Qwen scored 58.3, while Gemini only scored 45.8, a big difference.

In terms of tool invocation, Qianwen incorporates tool - using ability into the model training. After the initial fine - tuning of tool use, the Tongyi team further conducts joint reinforcement learning training based on rule - based rewards and model - based rewards on a large number of diverse tasks, enabling Qwen3 - Max - Thinking to have a more intelligent ability to combine tools for thinking.

It uses a three - step training method: first, teach it to use tools, then strengthen the practice in various tasks, and finally form a conditioned reflex. The advantages are obvious: it's fast and smooth to use, no need to read the tool manual every time, and the model knows when to use what tool on its own. This is why Qwen scored 12 points higher than Gemini in the HLE test, especially when it comes to using several tools in a row to solve complex problems, this "muscle memory" advantage becomes apparent.

In contrast, Gemini follows the traditional software engineering approach: the model is only responsible for understanding what you want to do, and the specific tool invocation relies on an external API framework. The biggest advantage of this approach is flexibility: Google can connect to Walmart's shopping function without retraining the model, just by plugging in an API. But the cost is that every time you use a tool, you have to go through the entire process of "understanding the intention - translating it into an API call - executing - parsing the result," which is slow and error - prone.

Qianwen's code generation ability has gone beyond a simple "syntax translator" and is more like a technical partner who understands your intentions. It can not only transform requirements into runnable code but also has engineering intuition: knowing when to optimize performance, when to simplify implementation, and when to add fault - tolerance mechanisms.

This kind of "degree" of control is precisely the key leap for AI from a "tool" to a "collaborator."

This article is from the WeChat official account "Silicon Star People Pro," author: Yoky. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

The Qianwen version of Gemini 3 is here.