Actual test of Zhipu's "first mobile phone agent": It has new ideas, but when it comes to the real world, there's nothing but a sense of powerlessness.
If an Agent can only use fixed functions, is it still an Agent?
At last year's AutoGLM press conference, Zhang Peng, the CEO of Zhipu, demonstrated an automated operation using AutoGLM: creating a group face-to-face, changing the group name, and sending out a hundred red envelopes in the group with a total amount of 20,000 yuan. The audience was amazed, calling it an AI that can "control a mobile phone." However, as soon as the demonstration stage is replaced with the real world, problems immediately arise—different users have different WeChat versions, with varying UI layouts. Some use foldable screens, while others use small-screen phones. Even advertising pop-ups may interrupt tasks. For large models, these uncertainties are variables that cannot be fully grasped.
Zhipu's chosen answer is not to continue improving the model's "cognitive ability," but to find a different approach: directly bypass the uncertainties of the real world and create a "standardized" world. The core of AutoGLM 2.0 is not an algorithmic breakthrough, but a cloud phone—with a unified size, software version, and function scope. Only in this virtual world can the operations of the Agent be guaranteed.
In other words, the concept embodied by AutoGLM 2.0 is not about taming the mobile phone, but about taming the chaotic reality by creating its own environment.
So, how is the product born from this concept?
1
AutoGLM's "Standardized" World
AutoGLM has two cloud devices, an agent phone and an agent computer. The phone mainly serves as a life assistant, responsible for travel and ordering food, while the computer undertakes the functions of office work and research.
Let's take a look at these two devices respectively to see what functions Zhipu has retained and what it has sacrificed in the customized world for the Agent to run.
AutoGLM Cloud Phone
Excluding the system-built-in apps such as the camera and clock, there are a total of 30 additional apps on the phone, which can basically cover daily life and entertainment.
Social and Information: Weibo, Xiaohongshu, Toutiao
Long and Short Videos: Hongguo, Douyin, Kuaishou, Bilibili, iQiyi, Tencent Video, Mango TV
Music and Radio: QQ Music, Qishui Music, Himalaya
Novel Reading: Fanqie Novel
Consumption and Shopping: Flush, Taobao, JD.com, Pinduoduo
Local Life: Meituan, Ele.me, Dianping, Alipay, KFC, Keep
Travel and Tourism: Didi Chuxing, Ctrip, Qunar, Gaode
Renting: Ke.com
Want to download new software using the browser? Unfortunately, Zhipu has blocked this path. I tried to install Hema and Zhihu, but the system directly prohibited it, prompting that the software package was invalid. It even returned a 404 error for Zhihu's official download page.
Next, let's take a look at the configuration. The cloud phone is a device based on Android 14, with the model number SM-F900F. From the search information, it turns out that this device is actually Samsung's first-generation foldable phone, the Galaxy Fold? (Isn't Redmi a better choice?) If the cloud phone uses a complete device rather than just the Fold's system, then the configuration is a Snapdragon 855 processor, 12GB RAM + 512GB storage.
AutoGLM Cloud Computer
The AutoGLM cloud computer is a device based on the Ubuntu system. From the software ecosystem perspective, apart from the browser, only the Libre office software is installed on the cloud computer. Perhaps what AutoGLM calls office work is just the Word, Excel, and PPT suite. Similarly, the AutoGLM cloud computer deletes the Ubuntu Softstore and prohibits users from downloading new software.
2
AutoGLM Actual Test: Ads and Logins Are the Biggest Obstacles
After getting a general understanding of the AutoGLM cloud phone and cloud computer, we also know the boundary of AutoGLM's capabilities. Next, it's time to test how AutoGLM performs in the limited environment.
Cloud Phone Task - Taobao Shopping:
prompts: Help me clear my Taobao shopping cart and then buy the 1TB version of the official store's iPhone 16 Pro.
In the Taobao shopping test, AutoGLM demonstrated a relatively complete process: from web search, requirement clarification to application operation. The problem occurred at the account login stage—almost all domestic applications now require login to use. When AutoGLM detects the need to log in, it will prompt the user to "take over." The user then manually enters the account and password, and then exits the takeover to continue the task.
However, the problem is that domestic apps have extremely high "security awareness" for accounts. The security verification of many apps is far more complicated than just entering an account. For example, when logging in to Xiaohongshu, it prompted that I needed to scan a code with an old device to log in, but my old device was using AutoGLM. When logging in to Douyin, I needed to scan my face for identity verification, but AutoGLM kept showing that it was loading (camera) resources. After finally loading, the image of me in the camera was so "distorted" that the recognition failed and the login was unsuccessful.
The failed login to Xiaohongshu also directly led to the inability to use some linked functions. For example, in last year's AutoGLM version, it was possible to search for the recipe and ingredients for braised pork on Xiaohongshu and then buy them at Xiaoxiang Supermarket.
Of course, AutoGLM can't be blamed for this. The domestic mobile app ecosystem may really not be suitable for the development of AutoGLM. Even worse, AutoGLM doesn't remember the user's account and password. While this is very secure, the process of logging in to apps every time is also very painful.
In addition, I found that when using Douyin's face recognition login, after AutoGLM calls the camera resources, if you exit Douyin (clear the background) on the AutoGLM cloud phone, it won't end AutoGLM's call of the camera unless you turn off AutoGLM.
Back to the task, apart from issues such as login, AutoGLM can easily complete tasks such as clearing the shopping cart. When performing key actions such as deletion and purchase, AutoGLM will remind the user whether to continue.
Cloud Phone Task - Buying Air Tickets:
prompts: Go to Qunar.com to buy a plane ticket from Shanghai to Beijing from 11:00 to 14:00 the day after tomorrow. No Boeing planes, please.
This task is not complicated, but I deliberately chose to test it at around 23:00, close to midnight. There are several key points that the AI needs to recognize in the task, namely "the day after tomorrow," "11:00 to 14:00," "Shanghai to Beijing," and "no Boeing."
I tried twice and both attempts failed. Looking at the flights, AutoGLM made a mistake in the ticket time the first time and in the date the second time, but the starting and ending points of the journey were correct, and neither attempt selected a Boeing plane.
As for the reason for the error, I carefully examined AutoGLM's operation logic. First, there was a bug in the date selection: after entering and exiting the calendar page, "the day after tomorrow" often inexplicably became "the day after the day after tomorrow." This problem doesn't occur stably, but it's enough to show that it's not reliable enough in basic interactions.
The problem with time selection is even more obvious. AutoGLM doesn't really understand the condition of "11:00 to 14:00." Instead, it mechanically relies on Qunar.com's preset options, which are only "9:00 - 12:00" and "12:00 - 15:00." It will randomly pick an interval. If there happens to be a matching flight, it's just a coincidence. Once there isn't, it will directly make an error. In other words, this is not intelligence but just a coincidence.
During the test of this task, I also found something as troublesome as member login—pop-up ads. Once an ad pops up, AutoGLM will freeze there. After a few seconds, if the ad is well - behaved and disappears automatically, AutoGLM will continue to execute the task. However, when encountering ads that don't disappear, AutoGLM will need the user to take over, seriously affecting the smooth progress of the process.
Cloud Computer Task - Creating a PPT and Posting It on Xiaohongshu
Different from the AutoGLM cloud phone, the cloud computer can only control the browser. Therefore, when asked to generate content such as PPTs and spreadsheets, it will use Zhipu CodeX programming to complete the task. From the page, it can be considered that AutoGLM (cloud computer) is an Agent with the added function of controlling the browser.