The "Working Agent" released by Zhipu doesn't require an invitation code.
Text by | Zhou Xinyu
Edited by | Su Jianxun
For today's "Six Little Tigers", it is particularly important to present their answers in the post-DeepSeek R1 era.
DeepSeek R1 and Manus have made a splash in the fields of inference models and AI Agents respectively. For latecomers, following is the most conservative path. For example, Baidu released the inference model Wenxin X1, and Tencent launched the Hunyuan Deep Thinking Model T1.
At the Open Day on March 31st, Zhipu, which has received a lot of funding in the domestic capital market, presented an upgraded version of R1 and Manus - an Agent product with deep thinking ability, "AutoGLM Contemplation (hereinafter referred to as 'Contemplation')", which has been launched for free.
AutoGLM Contemplation.
Zhipu's first-generation AutoGLM sent the first red envelope issued by AI in November 2024: AI sent the first red envelope in human history. "Contemplation" is more like an intern. It can understand, analyze, and search for information sources based on open natural language questions.
Moreover, going a step further than "online search", "Contemplation" can access information sources without open APIs, such as CNKI, Xiaohongshu, official accounts, JD.com, and Juchao Information. It also has multi-modal understanding ability and can understand text and image information on web pages.
An example shown by Zhang Peng is that using "Contemplation" to operate a Xiaohongshu account gained 5,000 followers in two weeks and received commercial orders.
The keys to starting a Xiaohongshu account are high update frequency and attractive topics. Just input a popular topic you want to discuss, such as "A guide to recommending the full set of equipment for hand-brewed coffee", and "Contemplation" can automatically summarize information from platforms like Xiaohongshu and Zhihu based on hundreds of information sources.
Zhipu's Xiaohongshu account operated by "Contemplation".
After DeepSeek App set a myth of 30 million daily active users, AI manufacturers' understanding of product paradigms has gradually changed: The ultimate form of an application is the model; so-called applications have become showcases for model capabilities.
It can be clearly felt that the interactive interface design of "Contemplation" highlights the thinking ability of the model itself more than Manus.
During the thinking process, "Contemplation" shows "thinking", starting from understanding and disassembling the problem, retrieving information, and then listing the framework of the solution. Manus focuses on showing "action", and the visual panel displays the process of AI calling tools.
Interface comparison between "Contemplation" and Manus. The upper one is "Contemplation", and the lower one is Manus.
Interface comparison between "Contemplation" and Manus. The upper one is "Contemplation", and the lower one is Manus.
Compared with Manus, which has the ambition to become the world's first general intelligent agent, the significance of "Contemplation" for Zhipu at present lies in demonstrating its model strength by showing the thinking chain, rather than just being usable and implementable.
Liu Xiao, the person in charge of Zhipu AutoGLM, also said bluntly that although "Contemplation" can perform simple tasks such as research report compilation, the current version provided to the public is only a preview version and has many deficiencies.
An intuitive comparison is that Manus can perform operations across multiple terminals such as PCs and apps by calling Claude's Computer Use ability and deliver results in specific forms such as PPTs and web pages.
For example, when inputting the prompt "Please create a Pac-Man web game with a jellycat theme, and the color saturation of the materials should not be too high", Manus can directly deliver a decent game web page (although the execution time is as long as 45 minutes and there are bugs in the game).
The Pac-Man web game delivered by Manus.
However, the current preview version of "Contemplation" can only deliver research compilations similar to Deep Research (a research intelligent agent launched by OpenAI) and cannot be used out of the box.
When inputting the above prompt, "Contemplation" can only output the code to implement the game, and users need to copy and run it additionally, which is not user-friendly for non-technical users.
The game code delivered by "Contemplation".
A Zhipu employee told "Intelligent Emergence" that "Contemplation" is still an experimental product. "Contemplation" cannot perform cross-terminal operations. To achieve this, it is necessary to integrate functions similar to Computer Use, such as GLM-PC (a computer operation model launched by Zhipu)."
After putting a lot of effort into Agents, what kind of technological strength does Zhipu want to demonstrate?
At the Open Day, Zhang Peng analyzed the model combination required to implement "Contemplation": the base model GLM-4-Air-0414, the inference model GLM-Z1-Air, and the Contemplation model GLM-Z1. These three new models correspond to the language understanding, problem analysis, and reflection and verification abilities required by an Agent respectively.
The new models behind "Contemplation".
It is worth mentioning that Zhipu proposed the concept of the "Contemplation Large Model", which also represents Zhipu's exploration of the next stage of R1. In Zhang Peng's view, relying solely on internal knowledge reasoning imposes significant limitations on traditional AI.
Going a step further than limited reasoning, "Contemplation" requires AI to be able to search the Internet in real-time, dynamically call tools, conduct in-depth analysis, and self-verify, thereby ensuring the reliability and practicality of successful delivery.
The disruptive move of "price butcher" DeepSeek is also forcing latecomers to either open source their models or provide models with higher cost performance.
Among the three new models newly released by Zhipu, the inference speed of the inference model GLM-Z1-Air is 8 times faster than that of R1, but the cost is only 1/30, and it can run on consumer-grade graphics cards. At the same time, these three new models will all be open-sourced on April 14th.
Of course, in the "post-DeepSeek" era, thinking about whether to adhere to pre-training and how to conduct commercialization are questions that the "Six Little Tigers" have to answer.
Here are some thoughts of Zhang Peng, the CEO of Zhipu, at the press conference regarding Agents, model technology, and commercialization, slightly edited by "Intelligent Emergence":
- Pre-training is still very important. Although pre-training doesn't receive as much attention now, various methods such as RL (Reinforcement Learning) still essentially rely on the ceiling of the base model brought by pre-training. As a base model manufacturer, pre-training is something we will definitely adhere to.
- The future new application forms, especially the application forms of intelligent agents, will still return to the model. In the future, many applications will be centered around the model. Wrapping a shallow or thin productization and application shell around the model will turn it into a product. Once the model's capabilities are improved, the product's capabilities will also be enhanced. This is a typical change in the new application paradigm.
- All the productization and engineering methods in between are expedients and compromise solutions. Once a brain as smart as a human is created, there will be less engineering work to do. Just like a human, only by equipping it with hands and eyes can it complete a lot of work. This is the ultimate goal of AGI.
- Not only does large model inference follow the Scaling Law, but we also found that Agents also follow a similar Scaling Law. By expanding the inference compute during training, we observed that Agents showed stronger performance.
- Whether enterprises or users call APIs or buy models, the biggest problem they face is how to use the model well. Under such a premise, whether to open source or not and whether to offer it for free are not particularly critical issues. Implementation requires the cooperation of both parties.
- Past historical experience, including examples like MySQL and RedHat, has actually proven that open source does not mean completely free. It also includes the investment of technical personnel and maintenance costs in the later stage, as well as exploring how to localize DeepSeek. You need to find a professional team. Therefore, services are the business model for open source.
- A general agent cannot have weaknesses. Why is AI's thinking and writing abilities far beyond yours, but it is still not as good as you? Because its abilities are uneven. The existence of obvious weaknesses will lead to a sharp decline in the success rate of applications.
- Why are current Agents blocked by third-party platforms? Essentially, it is because they are not smart enough. If an Agent can really pass the Turing test, I believe it will be difficult to implement the current blocking and interception strategies. So avoiding interception is essentially an engineering and technical problem.
- We will have corresponding layouts in embodied intelligence, but it may take a little more time.
- I don't think we are a To B company. I hate being labeled. We only do what we think is meaningful, and these things will have different application methods and values in different scenarios or for different customers.
Welcome to communicate!