The Open-Source Revolution of Robots: The Four Forces and Games Behind the "Free Brains"
Around February this year, Xiaomi, Ant Group, Alibaba DAMO Academy, and Unitree successively released open - source robot models. Even earlier, NVIDIA launched GR00T N1.6 at CES, upgrading its so - called "the world's first open basic model for humanoid robots" once again.
These consumer electronics companies, Internet giants, and chip empires have recently all made the "brains" of robots available for free to the world. What kind of strategies and trillion - dollar bets are involved in the ecosystem of open - source robot models?
In this article, we continue the robot series. In our previous article "Closed - source Robot Models", we analyzed the currently popular VLA model in embodied intelligence, dissected the different approaches of closed - source giants like Tesla and Figure, and how they built moats with their hardware and data advantages. In this article, after in - depth discussions with researchers from top global embodied intelligence laboratories, we will explore the core players and key technology leaders in the open - source algorithm route.
Meanwhile, we will try to answer these three questions:
First: What technical routes do these open - source models take, and why can they challenge the giants?
Second: What is the motivation for open - sourcing? What is "true" open - source and what is "false" open - source?
Third: What is the ecosystem of open - source models like? What can the open - source community rely on to compete with opponents like Tesla?
01 Panorama of Open - source Models: Who is Involved and What Routes are They Taking?
In the open - source model category, the VLA model remains the mainstream. Simply put, it enables the robot to "see" the surrounding environment, "understand" your instructions, and then "perform" the correct actions.
Currently, open - source VLA models can be roughly divided into four forces:
1. Academic school: With relatively small parameters, they can achieve great results with limited resources. The representative models are OpenVLA and Octo.
2. Giant ecosystem school: They not only develop models but also layout the entire toolchain. The representatives are NVIDIA's GR00T N1 and Google's Gemini Robotics.
3. Start - up companies and Chinese forces: Zibianliang, OpenMind, Xiaomi, Ant Group, etc.
4. Extreme technology school: They pursue extreme accuracy and generalization ability. The representative model is π₀ of Physical Intelligence.
1.1 The Idealism of the Academic School
OpenVLA gained wide recognition in June 2024. This open - source model with only 7 billion parameters comprehensively defeated the "top - tier" RT - 2 - X of Google DeepMind in 29 robot operation tasks. RT - 2 - X has 55 billion parameters, eight times larger than OpenVLA, and stands behind the entire computing power and data resources of Google. However, the result was that OpenVLA's success rate was 16.5% higher than that of RT - 2 - X.
OpenVLA achieved great results with limited resources thanks to a very smart architectural design: two visual encoders plus a large - language model.
Compared with Google's RT - 2 - X, which uses only one visual encoder, you can imagine it as a very smart person who does everything by themselves: having strong capabilities but lower information - processing efficiency.
OpenVLA uses two visual encoders, equivalent to having "two pairs of eyes". The first pair of eyes is called "DINOv2", which is responsible for understanding spatial relationships; the second pair is called "SigLIP", which specializes in understanding semantics and common sense. Then, the open - source large - language model Llama 2 at that time acts as the "brain", fusing spatial information and semantic information to process instructions and perform reasoning.
Simply put, OpenVLA is like a small three - person team collaborating. By physically isolating and optimizing two types of information separately and then making unified decisions, the overall performance is even stronger. You can roughly understand it as "three cobblers with their wits combined equal Zhuge Liang". This architecture proves that in the field of embodied intelligence, simply being "large" does not mean being "smart".
OpenVLA also has an advantage in its dataset called "Open X - Embodiment", which is also a very powerful advantage of the open - source ecosystem and will be elaborated on later.
In addition, OpenVLA has also optimized its action representation method and training strategy. Therefore, its victory over Google this time is the result of a combination of "data + architecture + training strategy".
Moreover, after its victory, OpenVLA was completely open - sourced: the code, model weights, and training scripts were all made public. This open attitude has excited the entire industry, leading to various subsequent optimizations, inference accelerations, and fine - tunings.
This is a very typical open - source story, which can use innovative methods to "achieve great results with limited resources" and drive subsequent work in the entire technical field.
Now let's talk about another typical open - source route, "Octo". If OpenVLA represents "scaled - up open - source", Octo represents "popularized open - source".
We know that the "generalization" of robot algorithms is a major challenge. The previous standard approach was to train strategies for specific robots using specific datasets. However, if you change the robot or the environment, you have to retrain everything from scratch. Some experts in the open - source community hope to achieve a "general robot model" and extend the model to a wide range of robots and scenarios through zero - shot technology. This path is called the "general robot strategy", and Octo is a representative of it.
Octo has only tens of millions of parameters, smaller in scale than OpenVLA. It is a diffusion strategy model based on Transformer, with a design that emphasizes flexibility and scalability. It supports multiple robot platforms and sensor configurations and can quickly adapt to new observation and action spaces through fine - tuning. This enables Octo to be widely applied in different robot learning scenarios.
Octo is not positioned as the most powerful but as something that everyone can use. It hopes to provide a lighter and more quickly adaptable general strategy basic model for the open - source community.
1.2 The One - Stop Ecosystem of Giants
At the GTC conference in March 2025, Jensen Huang personally promoted the release of GR00T N1, claiming it to be "the world's first open basic model for humanoid robots". By CES in January 2026, it had been iterated to version N1.6.
GR00T N1 adopts a dual - system architecture: a "System 2" based on a visual - language model is responsible for slow thinking, understanding the environment, interpreting instructions, and making plans; a "System 1" based on a diffusion Transformer is responsible for fast thinking, converting plans into precise joint actions at a high frequency. The two systems are jointly trained end - to - end and are tightly coupled.
With 2.2 billion parameters, the model weights and code are both made public, and many leading humanoid robot companies have obtained early access rights. Moreover, NVIDIA not only provided the model but also the entire ecosystem: using Omniverse for digital twin, Isaac Sim to generate synthetic training data, Cosmos to generate video data, and Newton physics engine for simulation, providing a one - stop service.
Google has also been continuously deploying in the general robot strategy. The early RT - 1 open - sourced its code and data, but the subsequent more powerful RT - 2 and later RT series became closed - source models and were not open to the public.
Recently, Google has also been accelerating. In 2025, it released the Gemini Robotics series of models and recruited Aaron Saunders, the former chief technology officer of Boston Dynamics, as the vice - president of hardware engineering. Demis Hassabis, the CEO of DeepMind, referred to this vision as "the Android of the robot world", aiming to create a general robot operating system and make Gemini the "brain" of various robots.
At CES in 2026, Boston Dynamics and Google DeepMind announced a strategic cooperation, integrating the Gemini Robotics model into the Atlas humanoid robot, and joint research will soon be carried out in the laboratories of the two companies.
Google has shifted from open - source to closed - source and then aims to create "the Android of the robot world". The track change is a bit fast and the ambition is quite large, but it is definitely one of the most important players in the robot industry. We are also looking forward to its next move.
1.3 Start - up Companies and Chinese Forces
China's participation in the open - source embodied intelligence field is accelerating, and the situation is changing from simply "following" to "participating in defining the rules".
Xiaomi - Robotics - 0, released on February 12th, has 4.7 billion parameters and uses the MoT hybrid architecture, which separates the "brain" (visual - language understanding) from the "cerebellum" (action execution), improving the common inference delay problem of VLA models. The model is open - source and can run on consumer - grade GPUs.
LingBot - VLA of Ant Group takes another route, emphasizing cross - form generalization. This model has pre - trained more than 20,000 hours of real - machine data on 9 different dual - arm robots, aiming to achieve "one brain controlling all types of robots", which is similar to the "general robot strategy" route we mentioned before.
X - VLA, jointly launched by Tsinghua AIR and Shanghai AI Laboratory, has refreshed five major simulation benchmarks. Its code, data, and weights are all made public, which can be said to be one of the most thorough open - source examples in the academic community.
Xinghaitu has open - sourced its real - machine dataset and its latest G0 Plus VLA model; GO - 1 of Zhibot has been deployed on real machines to perform tasks; ERA - 42 of Xingdong Jiyuan is also exploring its own route.
In addition, Zibianliang Robotics is a Chinese embodied intelligence start - up company focusing on the research and development of the "brain" of general robots. CTO Wang Hao talked about the original intention of open - sourcing in an interview with the Silicon Valley 101 podcast.
Wang Hao
CTO of Zibianliang Robotics
We continuously carry forward the open - source spirit and have absorbed a lot of experience. We have used about tens of thousands of hours of real - world data and extended it based on the pre - trained basic visual - language model to make it have strong visual understanding, spatial reasoning, and multi - language instruction - following abilities. At the same time, the accuracy of its action generation is also relatively high. We