Humanoid Robots Begin Lobster

Is OpenClaw rewriting the competition logic of humanoid robots?

Whether OpenClaw has revolutionized the way office workers work is unknown, but it seems that embodied intelligence is on the verge of being disrupted.

Recently, many such videos have been circulating online. Some netizens who were not satisfied with "cyber shrimp farming" connected OpenClaw to a camera and a robotic arm. After the connection, people found that OpenClaw can not only work on the computer but also perform well in the real world.

For example, some netizens equipped OpenClaw with a computer, a robotic arm, and a camera. They didn't write a new program for this task or train a separate model. They just said to OpenClaw, "Sort these car parts."

OpenClaw completed the sorting of these parts.

How big an impact does this have on embodied intelligence?

Well, less than a year ago, these capabilities were worth holding a special press conference and spending millions of dollars promoting worldwide in humanoid robot companies.

But now, the same capabilities are easily achieved by OpenClaw, and it's not even a tool specifically designed for embodied intelligence.

So, this whole thing sounds a bit absurd.

Then, what exactly does OpenClaw bring to humanoid robots? With OpenClaw being so powerful, does a dedicated large embodied model still make sense? Has the previous work of robot companies been in vain? Why can OpenClaw easily achieve what robot companies have worked hard on for years?

And when the tide goes out, who will be left exposed?

When "Shrimp Farming" Meets Robots

I still remember that around early April last year (2025), a leading domestic humanoid robot company solemnly held a press conference in Beijing to introduce a humanoid robot development platform.

At that time, the core highlight of this platform was that it could sort scattered parts in an industrial scenario with just voice commands, with smooth movements and a low error rate.

Sound familiar? It's almost the same as what OpenClaw can do today.

The difference is that this platform released by the company is specifically for robots. It breaks down dozens of scenarios, trains agents, and then connects them through behavior path planning, which involves a lot of work.

At that time, the company's slogan for this platform was: "The most important piece of the puzzle for humanoid robots to move from performance to work and from the laboratory to the factory." Now, OpenClaw seems to easily achieve similar capabilities, but obviously, it didn't go through all that.

What's this like? It's like you and your friend go climbing together. You prepare carefully, set off early, and spend a lot of time. When you finally reach the mountaintop out of breath, you find that your friend has been waiting for you here for a long time, having come by helicopter.

Specifically, OpenClaw also shows strong generalization ability, decision - making, and self - evolution ability in more scenarios.

For example, in an experiment, there was a more daily - life - oriented test. The staff said to the robotic arm, "Today is the Lantern Festival. Make me some sweet rice wine dumplings."

The robotic arm first paused to think about the task and then started to execute: it poured the soup into the pot, put the dumplings in, and waited for the water to boil.

Mid - way, the staff asked, "Can you add some sugar?"

The robotic arm asked back, "Brown sugar or osmanthus sugar?"

After getting the answer "brown sugar", it poured the sugar into the pot.

There are also various other experiments. For example, some developers connected OpenClaw to an industrial robotic arm and asked it to complete grasping or handling tasks according to natural language instructions. The system can even automatically generate Python scripts to control the robotic arm.

In addition to robotic arms, quadruped robots also quickly appeared in various "shrimp farming" experiments.

In some videos circulating on Reddit and X, some developers connected OpenClaw to a robotic dog and let it patrol autonomously in the environment.

In the past, such robots usually needed remote control or to follow a pre - designed route. But in these experiments, there was no control and no pre - planned route. The robotic dog judged and planned on its own based on the environment it saw through the camera, such as avoiding obstacles or re - planning the route when encountering new situations.

When these experiments start to involve humanoid robots, things become even more interesting.

For example, in an open - source community, someone released a set of Unitree - robot skills suitable for OpenClaw. With this integration, developers can directly control Unitree robots, such as the G1, and even the larger H1, as well as the quadruped robots GO1 and GO2, through instant messaging software.

The whole process is much simpler than expected. Developers don't need to open a complex graphical interface or manually call the SDK. They just need to send a message in the chat window:

"Move forward one meter."

"Turn left 45 degrees."

The robot will execute the corresponding actions.

This control is even two - way. OpenClaw can obtain environmental images from the stereo camera on the robot and send the screenshots directly back to the chat window, allowing developers to check the scene at any time. If a path - planning module is connected, the system can also automatically plan routes and avoid obstacles.

Similarly, there is no pre - set script or pre - planned action path in the whole process.

Developers just set a goal, and the rest is left to the AI, which makes its own judgments and plans.

Can a Small Lobster Overturn Humanoid Robots?

From various demonstration videos, we can see the amazing capabilities of OpenClaw combined with other large models.

In the past, these capabilities would have been the proudest achievements of many humanoid robot companies, but now they seem ordinary.

Therefore, it's inevitable to wonder: Is the ability that the robot industry has achieved over the years through data collection, model training, and system development still valuable?

The answer is, of course, no.

Let's start from the beginning. Apart from the robot's body, the decision - making system can be roughly divided into four layers from top to bottom:

Decision - making layer (brain): Understand the goal and break down the task;

Perception/representation layer: Identify the environment, target, and spatial state;

Behavior organization layer: Break the task into skills and action sequences;

Control layer (cerebellum): Trajectory, servo, obstacle avoidance, and safe execution.

Under this framework, OpenClaw is mainly responsible for the invocation, arrangement, and connection of the capabilities of the previous layers. As for how the robot finally moves and whether the actions can be stably implemented, it still depends on the underlying control system, kinematic solution, and execution link.

Therefore, OpenClaw doesn't make the robot suddenly learn to move. Instead, it's more like an upper - level scheduling system that translates human instructions into a series of callable capabilities.

There are actually two real highlights worth noting here.

First, OpenClaw has changed the way robots acquire these capabilities.

In the past, many capabilities were not impossible to achieve, but often required a large amount of data collection, specialized training, and complex rule engineering for a single task.

Now, OpenClaw can directly use mature multi - modal models, tool systems, and modular execution links to turn many capabilities that originally needed to be developed and trained separately into directly callable and quickly combinable capabilities.

As a result, for the same grasping, searching, or inspection task, the development efficiency is higher, the trial - and - error cycle is shorter, and the overall cost is lower.

Second, OpenClaw enables robots to have a capability that was rarely truly established in the past: continuous memory of the real world.

Traditional robots mainly work "in the moment". They react to what they see, and after the task is completed, their understanding of the environment mostly stays at that moment. Many systems can certainly create maps, perform positioning, and save task states, but they usually don't continuously organize "locations, objects, events, and time" into a unified memory structure that can be called at any time.

Now, OpenClaw is starting to try to organize important objects, locations, events, and time perceived by the robot into a retrievable spatio - temporal semantic memory.

This means that the robot is no longer just executing commands but continuously accumulating context.

When a person enters the room, where an object is placed, and at what time a behavior occurs, all these can be used as the basis for subsequent search, judgment, and action.

Of course, this doesn't mean that it already has a complete understanding of the world like a human, but at least it shows that it is starting to have a structured memory ability for the real world.

The significance of this is that the boundary of robot capabilities is extending from "completing a single task" to "continuously understanding an environment". (In the same or similar environments, continuous context will improve task continuity and local stability, but this doesn't mean that the system has obtained generalized generalization ability.)

Of course, OpenClaw can achieve these things not out of thin air. There are two important reasons behind it.

The first reason is that in recent years, the underlying architecture of robots has started to change.

In the past, many robot systems were more like isolated silos: perception was one set, planning was another, and control was yet another. The connections between them were complex, and the development threshold was high. Although many capabilities already existed, they were difficult to call flexibly.

Now, robot systems are becoming more and more modular and standardized. Cameras, robotic arms, grasping modules, path planning, and underlying control interfaces are gradually becoming plug - and - play, combinable capability units.

OpenClaw seems so powerful not because it creates underlying robot capabilities out of thin air but because it can reorganize these capabilities on top of an increasingly standardized execution stack.

The second reason is that multi - modal large models are quickly integrating previously scattered capabilities.

In the past, when a humanoid robot needed to complete a task, it often had to solve many problems separately: text understanding, voice recognition, image recognition, video understanding, target detection, spatial judgment, and task breakdown, which often had to be done by different modules.

But now, multi - modal large models can simultaneously process different types of information such as text, images, voice, and video and understand them in the same context. This means that the perception and understanding capabilities that robots previously needed to train and connect separately are being gradually absorbed by more general basic models.

This has significantly lowered the development threshold of the upper - level intelligence of robots. And this is the significance of OpenClaw. It doesn't reinvent these capabilities but connects these already stronger general capabilities to the robot system more efficiently.

Does the Large Embodied Model Still Make Sense?

At this point, a more crucial question naturally arises: Since the basic models are becoming stronger and stronger, does it still make sense to develop a dedicated large embodied intelligence model?

After all, before this, many humanoid robot companies had high - profile announcements about self - developing large embodied models and regarded them as the most important strategic core of the company, as if whoever mastered the embodied model would master the future of robots.

But now, it seems that general basic models are quickly filling in the capabilities of understanding, perception, and task arrangement. Some of the upper - level capabilities that robot companies spent many years building are being quickly generalized by larger basic model systems.

The answer is: Yes, and it's still important.

The reason is that the strengthening of basic models mainly changes the robot's ability to "understand the world", while the large embodied model truly determines the robot's ability to "perform actions in the physical world".

Understanding a sentence, identifying a target, and breaking down a task are indeed becoming more and more like general capabilities. But the most difficult part for robots is never just understanding and seeing. It's about whether the actions are actually feasible in the real world, whether the grasping angle is correct, whether the trajectory is stable, whether the contact force will get out of control, whether it can continue after the target is blocked, whether it can recover after a failed grasp, and whether it can succeed in a different scenario, with a different object, or on a different machine.

These problems cannot be automatically solved just by stronger "understanding ability".

The value of the large embodied intelligence model lies not in taking on all tasks but in precipitating a large amount of experience related to actions, operations, and interactions, so that the robot can not only create a demo but also form stable, reusable, and generalizable capabilities.

In other words, the general model is taking over the "understanding layer", while the embodied model still holds the "action layer" and the "physical implementation layer".

So, the large embodied model is not meaningless. Its role is changing: in the past, it was like an all - in - one "full - stack brain", and now it's more like a key layer in the entire robot system that determines the upper limit of capabilities.

Finally, let's go back to the original question. What exactly does OpenClaw bring to the humanoid robot industry?

The answer is that it makes the entire industry accept an earlier fact: The upper - level task intelligence of humanoid robots is rapidly becoming generalized.

In the past, the most scarce ability for many companies was to integrate understanding, perception, planning, and invocation into a working system. But now, with the maturity of multi - modal basic models and Agent frameworks, this threshold is rapidly decreasing.

It will become easier and easier to create a decent demo, which also means that the robot industry is entering deeper waters.

In the future, the competition will no longer be about who can create a demonstration that "understands instructions" first, but about who can make the actions stable, increase the success rate, and turn the system into a product with low latency, reproducibility, mass - production ability, and safe deployment. What really determines the outcome will be more underlying professional capabilities: control, data, robustness, engineering, and mass - production ability.

In other words, OpenClaw lowers the threshold for creating demos but doesn't lower the difficulty of creating products.

And this is exactly its biggest impact on the industry: Companies that are still on the surface, relying on hand - made demos to tell stories, will quickly see their competitiveness squeezed. When the tide goes out, we'll know who's swimming naked.

This article is from the WeChat official account "YouJieUnKnown". Author: Qian Jiang, Editor: Shan Cha. Republished by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Humanoid robots have also started raising lobsters.

When "Shrimp Farming" Meets Robots

Can a Small Lobster Overturn Humanoid Robots?

Does the Large Embodied Model Still Make Sense?