Stop waiting for the killer app.
Last night, a friend sent me a WeChat message:
“GPT-5.4 has been released. It can operate a computer on its own. What's the difference between this and Manus? And what's the difference with OpenClaw? Aren't they all for getting work done?”
I thought for a while and replied to him:
“You're asking the wrong question. You should be asking: What on earth is going on when you look at all these things together?”
1. The “killer app” that everyone is waiting for may never come
In the past two years, if you've followed the discussions in the AI field, you must have heard this kind of voice:
“The model itself isn't important. What matters is the applications on it.”
“Invest in the application layer because the winners in the infrastructure have already been determined.”
“The day when the killer app appears is when the AI bubble will be truly verified.”
This sounds quite reasonable. Isn't this how the history of the Internet has been? The TCP/IP protocol is very important, but ordinary people have no idea what it is. What people know are Netscape, Amazon, and WeChat. The same goes for the mobile Internet. Android and iOS are the foundations, but what really changes people's lives are Douyin, Meituan, and Uber.
Naturally, everyone is waiting for that “killer app” for AI. Investors are betting their money on various Agent products. Entrepreneurs are desperately building on top of the models. Analysts keep asking: “When will the killer app finally come?”
Is this logic correct? It has been correct in every previous technological wave.
But this time, there might be a problem.
Where does the problem lie? It lies in a fundamental change: In every previous technological wave, the boundary between the platform and the application was clear. But this time, this boundary is dissolving.
In the Internet era, TCP/IP was the platform, and the browser was the application. In the mobile era, iOS was the platform, and the App was the application. The platform provides capabilities, and the application uses these capabilities. The platform doesn't interfere with the application's business, and the application can't do without the platform.
But what if the model itself is the platform? What if the model can not only provide capabilities but also use these capabilities on its own? What if the model can directly operate Excel, the browser, and the email client without an application?
Then the concept of the “killer app” loses its meaning.
Because the model itself is becoming the ultimate application.
This isn't just theoretical speculation. The release of GPT-5.4 makes this for the first time tangible.
2. Why does OpenClaw have to be acquired?
Let's talk about OpenClaw first.
It suddenly became popular last year, both inexplicably and predictably. It was inexplicable because what it does isn't new—the concept of AI operating a computer has been studied in academia for almost a decade. It was predictable because it came at the right time: In 2025, AI was already smart enough to write poems and code, but it could only stay in the chat box, watching you copy and paste over and over again.
OpenClaw did one thing: It enabled AI to see the screen, move the mouse, click buttons, and type on the keyboard.
What problem did it solve? It solved the problem of AI being “able to talk but not able to act”. No matter how smart AI is, it can only stay in the chat box. It can see the world but can't touch it. OpenClaw became that “hand”.
The community was in an uproar. The number of stars on GitHub exceeded 50,000. The “lobster-raising” craze in 2025 was essentially countless people trying: If AI can operate a computer, what can it do for me?
But OpenClaw has a fatal problem: It has hands but no brain.
You need to write a script for each task. “First click here, then enter that. If a pop-up window appears, click OK.” By the time you finish writing the script, you could have done the task yourself. It's like an intern with extremely strong execution ability but no autonomy. It can't act without a script.
So OpenAI acquired it.
Not to make a better OpenClaw. Not to launch a “killer app”. But to directly install that “hand” into the model's “brain”.
GPT-5.4 is that “brain” with a “hand”.
From now on, there's no need for the splicing of “AI + middleware”. The model can see the screen, decide where to click, and click on its own.
The significance of OpenClaw as an independent framework has been absorbed by the model itself.
This is a signal: The tools in the application layer are being absorbed back into the model layer.
3. What about Manus? Isn't it also an “application”?
Yes. Manus is a product. It packages the Agent capabilities. You can use it by scanning a QR code without worrying about what model is behind it. It's like a housekeeping service company. You call and say “Send someone to clean the house”, and they'll send an aunt over.
But the question is: Where do the aunts in the housekeeping service company come from?
The “aunts” behind Manus are models. Today, Manus uses GPT-5.4. Tomorrow, it could use Claude Opus 4.7. The day after tomorrow, it could use Google Gemini 4.0. The product can change models, but there's only one model at the core.
This is why Peter Steinberger, the father of OpenClaw, chose to join OpenAI instead of starting his own company to sell the enterprise version of OpenClaw. Because he knows that the framework ultimately serves the model, and the model is the “constant base”.
Someone in the community said: “The model is the product, and the framework is just the packaging.” The difference in experience between running GPT-5.4 with OpenClaw and running a low-end model with OpenClaw can be over 40%.
So, products like Manus will still exist and will still be used. But they'll always be downstream of the model, always have to upgrade with the model, and always have to play the game within the rules defined by the model.
If in the Internet era, “applications ran on protocols”, and in the mobile era, “applications ran on systems”, what's happening now is: Applications run on models, and the models are starting to do the “running” themselves.
4. If the model is the platform, where do the application software go?
Now let's ask a deeper question: If the model becomes the platform, where do software like Excel, the browser, and the email client go?
The answer is a bit counterintuitive: They won't disappear, but they'll “retreat into the background”.
Just like cloud computing today. You don't need to know which server your code is running on. You just need to call the API. The same will be true for future application software—you don't need to open it, learn its interface, or remember where its menus are. All you need is its functionality, and the model will call these functions for you.
You can think of it this way: Application software is moving from the “front stage” to the “backstage”, from something users directly interact with to a “capability layer” that the model calls.
This isn't just my imagination. Several key designs of GPT-5.4 point in this direction:
Native computer usage ability—When the model can directly operate real software, the need to develop independent applications for specific functions is greatly reduced.
Tool search—In a multi-tool scenario, the model can dynamically load the definitions of the required tools, reducing token consumption by 47%. This means the model can manage hundreds of tools simultaneously and schedule various capabilities as needed.
Visual perception upgrade—It supports image input of up to 10.24 million pixels, allowing the model to see every pixel on the screen. Any software you can operate with your naked eye, it can too.
Combined with these capabilities, GPT-5.4 becomes a general digital executor. It's no longer just a language model but can directly act on the entire digital world.
You don't need to learn Excel functions, PPT layout skills, or the rule settings of the email client anymore. You just need to tell the model what you want.
The complexity of the software is encapsulated by the model.
If this scenario becomes a reality, the concept of the “killer app” will become quite awkward—because users will no longer interact with applications directly. They'll only interact with the model.
5. But this leads to a deeper question
Kevin Lu, in his article “The Only Important Technology is the Internet”, put forward an insight:
The Internet is the “dual existence” of “next-word prediction”.
What does this mean? Next-word prediction requires a huge amount of sequential data, and the Internet happens to provide this data. Moreover, the Internet isn't just about having a lot of data. The key is that it's diverse enough—from primary school to doctoral level, from mainstream to niche, from “aligned” to “unaligned”. When a model is trained on the Internet, it learns not only knowledge but also the complexity and diversity of the world.
So he said: The Internet is the “primordial soup” for next-word prediction.
Then the question is: If the dual of next-word prediction is the Internet, what's the dual of reinforcement learning?
Next-word prediction uses the Internet to solve the data problem. What does reinforcement learning need? It needs scalable, diverse, and self-driven reward signals.
The current data sources for reinforcement learning are too narrow:
- Human preference feedback is difficult to collect, has large individual differences, and is noisy.
- Verifiable rewards are limited to narrow fields such as mathematics and programming.
- Robot data is costly to collect and difficult to scale.
- Transaction data may cause the model to learn to “not participate in the game”.
The combination of GPT-5.4 and OpenClaw may suggest a direction: The computer operation trajectory itself may become the “Internet” for reinforcement learning.
When the model starts to operate the computer on a large scale, every click, every keyboard input, and every successful task completion can become training data for reinforcement learning. Just as the Internet provides a huge amount of text for next-word prediction, the computer operation trajectory may provide a huge amount of behavioral data for reinforcement learning.
In other words, GPT-5.4 isn't just here to do the work for workers. It's here to accumulate experience for itself. Every time it helps you fill out a form or send an email, it's learning how to do it better next time.
You think you're using it, but actually, it's learning from you.
This is also why OpenAI acquired OpenClaw. What they want isn't just a single lobster that can work but the trajectory data of millions of lobsters working together.
6. The model doesn't equal everything: What's left in that “everything”?
If you've read this far, you may have an impression that the model is swallowing everything. Applications are gone, software is retreating, and in the future, the whole world will be talking to a dialog box.
But there's something missing in this picture: humans.
Not humans as “users” but humans as “part of the world”.
Kevin Lu, in that article, talked about an easily overlooked truth: The intelligence of a model depends on the diversity of the world it contacts.
He used a counterintuitive study to illustrate this point: To train an “aligned” model, you actually need to include “unaligned” content in the pre-training data—such as toxic remarks on 4chan. Why? Because the model needs to learn to distinguish between “aligned” and “unaligned”. If you only show it filtered and pure data, it won't be able to truly understand what “aligned” means because it hasn't seen the opposite.
The conclusion of this study is: The “goodness” of a model depends on it having seen “badness”. The “kindness” of a model depends on it understanding “evil”. The “correctness” of a model depends on it having come into contact with “errors”.
What does this mean?
It means that the intelligence of a model doesn't come out of thin air, nor can it be infinitely improved just by algorithm optimization. It depends on a premise: There's a world that's diverse enough, complex enough, and “imperfect” enough for it to learn from.
This world is the Internet. The Internet full of junk information, wrong opinions, marginal cultures, and minority languages. The Internet where people are arguing, spreading rumors, refuting rumors, and exposing false claims every day. The Internet that has both Khan Academy and 4chan, both Wikipedia and Reddit chaos.
If this world becomes single, pure, and orderly, the intelligence of the model will degenerate.
So, the first meaning of “the model doesn't equal everything” is: The model depends on the world outside of it. The diversity of that world is the soil for the model's intelligence. Without the soil, the crops will die.
7. There's a second layer: Who defines “good”?
Daisy Alioto, in her article “The Future of Media is a Bank”, put forward a concept called “proxy capital”.
She said that in the future, the people with the most proxy capital will be those with the most excellent taste.
Why? Because when AI agents consume content for you, allocate your entertainment budget, and filter information for you, they need to know what is “good”. And this “good” can't be defined by algorithms. It can only be defined by your taste.
Your taste is the data for training your AI agents.
Applying this logic to a model like GPT-5.4: When the model starts to operate the computer for you, do work for you, and make decisions for you, it needs to know what is “correct”. Part of this “correctness” comes from verifiable rewards (such as getting a math problem right), but a larger part comes from you.
Your judgment, your preference, and the standard you consider “good” are becoming the data for training the model.
When information becomes easily accessible, the real advantage doesn't lie in who gets the information first but in the ability to perceive, interpret, and “summon reality” from the flood of information.
Put simply: In the past, the competition was about who had more information and who could calculate faster. In the future, the competition will be about who can tell the model what is right, what is good, and what is worth pursuing.
So