HomeArticle

Just one day after Huang's Cosmos 3 was released, it was overtaken by a Chinese company.

机器之心2026-06-03 15:35
After raising nearly 5 billion yuan in three months, Qianxun Intelligence has once again topped the global real-device list.

On June 1st, Jensen Huang spent a significant amount of time at GTC discussing Physical AI and Embodied Intelligence, and also made a major announcement of Cosmos 3. NVIDIA defines it as the latest cutting - edge model for Physical AI and the world's first fully open all - around model, which natively has the capabilities of visual reasoning, world generation, and action generation.

Jensen Huang proudly stated that Cosmos 3 ranks first among open models on major global rankings.

However, just one day later, the RoboArena ranking was updated, and Spirit v1.6 of the Chinese company Qianxun Intelligence overtook Cosmos 3 to claim the global top spot.

Why is RoboArena worth paying attention to?

Because it addresses a core issue in the current evaluation of robot foundation models: many models can perform well in simulation environments or static benchmarks, but it's difficult to reproduce their performance stably when it comes to real robots, real objects, and real errors.

We can understand RoboArena as the embodied robot version of LMArena. However, while LMArena compares the quality of large - model answers, RoboArena compares the ability of robot strategies to complete tasks in the real world. RoboArena was initiated by institutions such as UC Berkeley, Stanford, and NVIDIA, and the relevant paper was selected as an oral presentation at CoRL 2025.

Specifically, the mechanism of RoboArena can be broken down into four points: Distributed collaboration, double - blind duels, Elo - style dynamic ranking, and an open evaluation network. Distributed collaboration expands the coverage of tasks and environments; double - blind duels reduce subjective biases in evaluation; Elo ranking keeps the leaderboard updated continuously like a sports event; and the open evaluation network allows more models to be tested in the same real - world arena.

Therefore, the significance of RoboArena lies in pushing the evaluation of embodied intelligence from "static benchmarking" to "real - machine confrontation".

In this context, Qianxun Intelligence has become the first Chinese enterprise to top the "away - game" leaderboard jointly dominated by Silicon Valley giants and top universities. Its significance is not just about leading the ranking, but also indicates that Qianxun Intelligence has entered the global first - tier in terms of multi - task execution, real - environment adaptation, and generalization ability.

What makes Spirit v1.6 win?

The results on the leaderboard are just numbers. More convincing is the operational performance of Spirit v1.6 in real tasks. Let's take a look at several groups of double - blind comparison videos.

First group of tasks: Open a laptop

This is not a simple grab. The robot needs to first identify the position and orientation of the laptop, then determine where to make contact, how to apply force, and how the hand and the robotic arm should cooperate, and finally complete the opening action. If any link in the middle goes wrong, the task may fail.

It can be seen that the actions of Spirit v1.6 are more natural, and it can quickly complete the task of opening the laptop. In contrast, Cosmos 3 hardly made any effective attempts.

Second group of tasks: Put the capybara on the plate

This type of task tests the robot's ability to recognize, locate, and perform fine operations on small objects. It not only needs to determine the location of the target object but also maintain stability after grabbing and accurately place it in the specified position.

This time, Spirit v1.6 still completed the tasks of recognition, grabbing, and placement. Although there was a brief adjustment during the grabbing process, the overall action chain was coherent, and the task was successfully completed. In contrast, pi 0.5 neither successfully recognized the target object nor completed an effective grab.

Overall, these groups of videos more intuitively illustrate the advantages of Spirit v1.6 than the leaderboard numbers: it can not only get higher scores in the evaluation but also run through the entire operation chain of "seeing, judging, grabbing, and placing" in real tasks.

Looking back in time, this result is not unexpected.

Earlier this year, Spirit v1.5 had already won the first place in the RoboChallenge real - machine evaluation, achieving a score of 66.09 and a success rate of 50.33%, surpassing pi 0.5 of Physical Intelligence. Public reports show that v1.5 has shown good stability in tasks such as multi - task continuous execution, complex instruction decomposition, object picking, flower arrangement, and object movement.

From v1.5 to v1.6, not much time has passed, but Qianxun Intelligence has overtaken others on RoboArena again. This is the result of Qianxun Intelligence's continuous iteration mechanism: continuously collect real - world scenario data, continuously identify where failures occur, and continuously feed the evaluation results back into training and engineering optimization.

Embodied intelligence models are different from pure software models. Simply increasing the training scale does not necessarily make them stronger. In the physical world, there are friction, occlusion, errors, delays, and a lot of uncertainties. The closer we get to real scenarios, the more important engineering organization ability, data closed - loop ability, and iteration speed become.

The performance of Spirit v1.6 on the leaderboard shows that Qianxun Intelligence has set this closed - loop in motion.

The real decisive factor lies in real - world data

At GTC, Jensen Huang repeatedly emphasized a problem: it is difficult to obtain data for Physical AI.

The reason is not complicated. There are many Internet videos, but most of them are in the third - person perspective. What robots really need are first - person, actionable, and feedback - enabled data. That is to say, robots not only need to "see the world" but also understand how to move, contact, grab, and change objects in the world.

One of the goals of Cosmos 3 is to alleviate the data scarcity problem in the robot field through Omniverse, teleoperation, and perspective reprojection. It represents a judgment of large companies on Physical AI: In the next stage, the improvement of model capabilities not only depends on parameters and computing power but also on the ability to build a larger - scale, higher - quality, and more robot - action - oriented data system.

Qianxun Intelligence is answering the same question but taking a different path.

Qianxun Intelligence emphasizes the continuous precipitation of real - world data. Public information shows that Qianxun Intelligence has self - developed 7 generations of lightweight wearable data collection devices and built a distributed data collection network in more than 100 cities across the country, forming a complete process from collection, cleaning, annotation to quality inspection. The company plans to accumulate millions of hours of real - world interaction data by the end of 2026.

Qianxun Intelligence's wearable data collection devices are collecting data simultaneously in multiple cities across the country.

This system can be understood as Qianxun Intelligence's "data pyramid".

At the bottom layer, there is large - scale real - world interaction data. For robots to enter home, store, factory, and warehouse scenarios, they must understand the clutter, changes, and irregularities in the real space. Although the clean and standardized demonstration data in the laboratory is important, it is not enough to cover the long - tail problems in the real world.

The data sources at this layer are not single. Internet videos can provide general visual common sense, wearable devices can record real human operation processes, teleoperation data helps the model align with the robot body, and roll - outs in the real environment continuously feed back the failure, correction, and recovery processes to the model.

In the middle layer, there is data engineering ability. It's not that the more data is collected, the better. The data needs to be cleaned, annotated, reviewed, and truly used for training. Especially the failure data, which is often more valuable than successful samples in embodied intelligence. Information such as why the model misses a grab, why an object drops, why it stops, and why it misjudges the contact point can all help the model iterate.

If the data only records "correct demonstrations", the model learns standard actions. If the data also records failures, slips, drops, interruptions, and retries, the model has a chance to learn to correct itself in uncertain environments.

At the top layer, there is model ability and task generalization. The data ultimately needs to be translated into the performance of real robots. If the real - world interaction data is diverse enough and the training and evaluation closed - loop is stable enough, the model is more likely to remain usable in unfamiliar objects, unfamiliar environments, and unfamiliar tasks.

The Qianxun team also mentioned an observation before: in embodied intelligence, there is an ability curve similar to the Scaling Law. For every order - of - magnitude increase in data scale, the task success rate may take a step towards higher stability.

This is also the significance of millions of hours of real - world interaction data. For robots, going from 90% to 99% is not just about doing a few more experiments but about covering more objects, more complex environments, more failure recoveries, and longer action chains.

From this perspective, the achievements of Spirit v1.6 are essentially an external verification. It shows that the scale, quality, and iteration efficiency of real - world data are becoming one of the most critical competitive variables among embodied intelligence companies.

Why do capital investors collectively bet on Qianxun Intelligence with nearly 5 billion in 3 months?

In addition to the model's achievements, what has attracted more attention to Qianxun Intelligence recently is its financing speed.

According to public information, Qianxun Intelligence has completed four rounds of financing in a row within 3 months, with a cumulative financing amount of nearly 5 billion yuan. After the completion of the 1.5 - billion - yuan Series A+ round, the funds will continue to be invested in the iteration of the new - generation embodied base model, the construction of a global real - data infrastructure, and the large - scale commercial implementation in multiple industries.

This financing rhythm is not common in the embodied intelligence industry, and the reason is not just the "hot robot track".

What capital really cares about is whether Qianxun Intelligence has formed a sustainable flywheel: Real scenarios bring real data, real data improves model capabilities, model capabilities in turn support the implementation of more scenarios, and more implementations continue to generate data.

Once this flywheel runs smoothly, the value of the company is not just that of a robot hardware company or a model company but also connects scenarios, data, models, and applications.

However, financing itself does not prove that the technology is necessarily leading. What really matters is where the money will be used.

For embodied intelligence companies, the direct uses of funds usually include three directions: one is to continue to expand the model training and inference infrastructure; the second is to build a larger - scale data collection and processing system; the third is to promote real - world scenario deployment. Qianxun Intelligence's current advantages happen to be concentrated in these three aspects.

It has the continuous performance of the Spirit series models in third - party evaluations, a real - world data collection system, and is also promoting implementation in scenarios such as factories, retail, and high - end manufacturing. This combination is the reason why capital is willing to bet continuously.

More importantly, Qianxun Intelligence does not regard commercialization as an "ancillary link" after model release but as part of data and model iteration.

According to public information, Qianxun Intelligence is promoting global industrial scenario cooperation with Bosch Group, using the real factory environment to verify the robot's execution ability in complex industrial processes; in the domestic retail scenario, Qianxun has launched a strategic cooperation with JD.com, and the Moz robot has entered JD MALL offline stores to undertake service tasks such as coffee making; in the high - end manufacturing scenario, the Xiaomo robot has been deployed on the power battery PACK production line of CATL, with a daily workload three times that of a human.

Qianxun Intelligence's robots have officially taken up their duties at JD MALL as baristas.

The industrial scenario values stability, efficiency, and safety boundaries; the retail scenario focuses more on interaction, service processes, and long - term operation; the manufacturing scenario requires robots to maintain reliable performance in high - rhythm and high - consistency tasks. The data and problems generated in different scenarios are different, which will also drive the model to improve its capabilities in different directions.

This is the significance of Qianxun Intelligence's commercialization "golden triangle": one end is the industrial scenario, one end is the real data, and one end is the model iteration. The three are not separated but mutually reinforcing.

For the embodied intelligence industry, the real challenge is not to make a demonstration video but to enable robots to work in the real environment for a long time. The real environment will continuously expose problems and generate new data. Those who can enter these scenarios earlier may accumulate the training fuel needed for the next - generation model earlier.

Conclusion

The competition in embodied intelligence is shifting from single - point model capabilities to a comprehensive system - ability competition.

Whether a model can understand tasks, execute stably, and adapt to unfamiliar objects and complex environments ultimately needs to be repeatedly verified in the real world. Simulation, foundation models, data collection, real - machine deployment, engineering optimization, and business scenarios are all difficult to determine the result independently, but together they form the basis for the large - scale implementation of Physical AI.

From RoboChallenge to RoboArena, from Spirit v1.5 to Spirit v1.6, Qianxun Intelligence's continuous performance shows that embodied intelligence is no longer just a technological demonstration in the laboratory but is entering a more open, dynamic, and real - application - oriented verification stage. Those who can establish a real - data closed - loop faster and more stably transform scenario feedback into model progress will have a better chance of taking the initiative in the next - stage competition.

The story of Physical AI has just begun. What really determines the industry's direction may not be a single press conference or a demonstration video but whether robots can continuously complete tasks, accumulate experience, correct errors in real scenarios, and ultimately move towards large - scale long - term applications. What Qianxun Intelligence is doing is to pave the most difficult and crucial path step by step.

This article is from the WeChat official account