The Ambition and Patience of WeChat AI: 10 Insights from Reading the Mini Program AI Integration Documentation
Yesterday, WeChat released the "Guidelines for Developers to Access the WeChat AI Ecosystem."
The industry is still very concerned about it. I see everyone forwarding it one after another.
But few people really study and carefully read those specific access documents. Most people are just saying that WeChat has finally taken action and so on.
I checked the background of my mini-program last night and found that this coverage is very wide. Even my mini-program with almost no user volume and that I haven't looked at for a long time has been included in the gray-scale test.
So I went through these documents. After reading the whole thing, there is still a lot of incremental information, which is worth in-depth research in the industry -
(Document address: https://developers.weixin.qq.com/miniprogram/dev/ai/guide.html)
I. The industry's reaction can be said to be calm
Let's first look at the popularity -
The official has two accounts - WeChat Open Class and WeChat Developers, both of which have issued this announcement.
For the article on WeChat Open Class, it has 33,000 reads, 410 likes, and 3,533 forwards;
The article on WeChat Developers is a bit better, with 65,000 reads, 660 likes, and 5,647 forwards.
These two figures are just average in the scale of WeChat's official accounts and can't be considered a big hit.
Then look at the data of the official example Demo (https://github.com/wechat-miniprogram/ai-mode-demo) on GitHub in the access document.
To a certain extent, this represents how many developers have really taken action.
As of 12 noon today, there are 11 forks and 92 stars, which is very ordinary.
It shows that the vast majority of developers are still waiting and even too lazy to clone the Demo and run it.
Of course, this is also normal.
After all, the one on GitHub is just a Demo, not the whole project. Moreover, the document itself states that "currently in the internal testing phase, the code review for the mini-program AI development mode is not yet open."
But the calm reaction precisely shows that many developers still stay at the perception that WeChat has launched another AI function.
This is also the reason why I want to disassemble this document.
II. The principle of voluntariness is very important
The first sentence at the beginning of the "Guidelines for Developers to Access the WeChat AI Ecosystem" is that it is based on fully respecting the rights and interests of developers and their independent choices.
The Q&A at the end of the article also specifically emphasizes: Whether to access is decided by the developer independently, and whether to access or not will not affect the existing mini-program services.
Why keep saying this?
Because WeChat knows very well what developers are afraid of - being forced by the platform and being treated badly if they don't access.
WeChat quickly shows a humble attitude: I won't force you, and if you don't access, it won't affect your current business.
This kind of caution itself shows how significant this matter is.
In my opinion, this is also a way for WeChat to be logically self - consistent:
After all, when Doubao mobile phone was launched, WeChat blocked it because Doubao mobile phone didn't get WeChat's consent.
The relationship between the platform and developers often boils down to two models - the big store bullying the customers and the big customers bullying the store.
WeChat is sometimes the customer (in the case of the App Store) and sometimes the store (in the case of mini-programs)
This principle of voluntariness is also to avoid being considered as the big store bullying the customers by the industry.
It wants to drive everyone to access naturally and reasonably.
This is the patience behind WeChat's AI ambition.
III. There are two Agent paradigms behind the automatic mode and the developer mode
Technically, how many ways are there for an AI to use a mini-program originally designed for humans?
The answer is two.
The first way is to let the AI use the mini-program like a human, staring at the screen and clicking with hands.
This is called GUI Agent. Computer Use follows this path, and Doubao mobile phone also follows this path.
The logic is very simple: how humans use the mobile phone, the AI will use it in the same way.
The advantage is its universality, but the disadvantage is that there is a certain probability of error, and it is slow and consumes a large amount of tokens. (Those who have used Computer Use of Claude Code and CodeX will know what I'm talking about)
The second way is to let the mini-program actively offer interfaces to the AI.
Its logic is the opposite: let developers directly provide data and interfaces. The advantage is that it is stable and accurate, but the cost is that developers have to take action.
The automatic mode is basically equivalent to the first way (there are still some differences, which will be discussed later), and the development mode is equivalent to the second way.
WeChat wants both.
IV. The essence of the automatic mode: screen reading plus code reading
So the question is -
How can the automatic mode achieve zero input from developers?
Let's look at the "WeChat AI Automatic Mode Service Terms". Few people read this carefully -
Let me do a small - scale Chinese - to - Chinese translation:
The first item is called page interaction technology authorization -
WeChat AI can access or operate the mini-programs you develop and operate through page interaction technology, including but not limited to obtaining the mini-program service pages for identification, reading, analysis, and processing, and performing operations such as searching, placing orders on behalf of WeChat users or inputting and authorizing the information required by the service on behalf of WeChat users as needed.
(There is no accessible link to this document. I made a copy. Students who want to read it carefully can reply "terms" in the background of the "Weixi Zhibei" official account to get the full text. It's really worth reading)
Translated, it means that the AI is like a real person, looking at the pages of your mini-program and then helping users click and place orders.
This is the standard GUI Agent.
Of course, the WeChat AI team has also been solving the problem of screen - reading efficiency. You can search for these two things: POINTS - GUI - G and UI - Oceanus.
The former achieved a score of 59.9 on the ScreenSpot - Pro, a recognized industry - most - difficult GUI positioning benchmark, and reached the SOTA of models of the same size.
This is the technical achievement of the WeChat AI technology team in improving screen - reading performance through reinforcement learning.
Why are robots designed in the shape of humans?
The reason is that the infrastructure in the physical world is designed for humans, and the human shape can reuse the existing infrastructure to the greatest extent.
This is also the reason why the WeChat AI team strengthens the GUI screen - reading technology. After all, mini-programs are interfaces designed for humans.
Let's stop digressing!
The second authorization is called mini-program skill generation and invocation authorization.
The platform will read the source code, parse the page structure and logic, API definitions, and interaction processes, and generate a skill based on this. Then it will follow this skill.
From screen reading to code reading, this is one level higher, and it is also one level higher than the pure GUI of Doubao mobile phone.
V. The automatic mode also has a cost
So the question is - does the automatic mode have a cost?
Let's look at Article 8 of the terms -
The WeChat AI function may appropriately adjust the display mode and user interaction experience of your mini-program, including but not limited to running the mini-program in the background, only displaying part of the mini-program interaction interface, and providing users with the ability to quickly confirm or abort operations.
Do you understand this sentence?
Running the mini-program in the background and only displaying part of the interaction interface -
This means that users basically won't see the carefully - designed home page of your mini-program.
The brand facade, operation positions, and advertising banners you created are likely to be directly skipped in the AI mode.
Of course, WeChat also has a certain solution to this problem: atomic components (which will be discussed later).
It allows merchants to render their own brand - related and interactive cards in the AI dialogue flow.
The official also requires that an entrance to the mini-program should be provided in the upper - right corner of the card, and the associated mini-program page should be configured, which is like leaving a door for merchants in the AI dialogue.
Then look at Article 6:
All actions of WeChat AI after obtaining user authorization are regarded as an extension of WeChat users' operations and the actions of WeChat users. Developers shall not arbitrarily change, block, or refuse to execute without justifiable reasons.
This means -
After you agree to the automatic mode, when the AI operates your mini-program on behalf of the user, legally, it is considered that the user is operating by themselves, and you are not allowed to stop it.
VI. The three - piece set of the development mode: atomic interfaces, atomic components, and skills
Let's look at the developer mode. The core consists of three concepts - atomic interfaces, atomic components, and Skills
Atomic interfaces are the smallest execution units, encapsulating single business functions.
"Query the drink list" is an atomic interface, "Create an order" is one, and "Initiate payment" is another. Each interface has standardized inputs and outputs.
Atomic components are the visual display units of atomic interfaces.
You can understand it as a card, which is rendered from a bunch of structured data output by the interface and directly displayed in the dialogue flow with the AI.
Skill is a complete ability packaged from the above.
A SKILL includes a business description (SKILL.md), a declaration of the model's callable capabilities (mcp.json), and the code implementation of atomic interfaces and atomic components.
A mini-program can have a maximum of 30 Skills.
These three things are connected through the mini-program MCP. WeChat specifically emphasizes in the document that its mini-program MCP is different from the standard MCP -
The mini-program MCP is a set of protocols that expose callable capabilities to the mini-program AI. Different from the standard MCP, the mini-program MCP adapts to the characteristics of mini-program development. Developers only need to provide a complete SKILL implementation as designed, and the mini-program AI can correctly infer and execute the corresponding atomic interfaces.
Personally, I feel that the design philosophy of this three - piece set is quite clear - simple, stable, and flexible.
VII. The Agent closed - loop behind a cup of coffee
What does this whole thing look like when it runs?
There is a timing diagram in the "Operation Mechanism" document, which can help us intuitively understand the whole process above -