HomeArticle

Zhang Yiming and Jensen Huang are on the same wavelength.

字母榜2025-09-17 18:03
ByteDance has robots, while NVIDIA has a technology stack.

Embodied intelligence has now become one of the hottest industries. The listing of Unitree Robotics, a leading company in the Chinese industry, is in full swing, while tech giants in China and the United States are making grand plans at a deeper level.

In March this year, at the GTC 2025 keynote speech, Jensen Huang, the CEO of NVIDIA, demonstrated the NEO Gamma humanoid robot from Norwegian robotics company 1X. This robot uses a post-training strategy built on NVIDIA's GR00T N1 model and performs autonomous tidying tasks.

Jensen Huang tried to prove with this that the future of humanoid robots lies in adaptability and learning ability. In other words, whether a robot is good or not depends on whether its "brain" can adapt to the environment and learn new knowledge.

Jensen Huang's words quickly came true. A few months later, global tech giants achieved new results in the area of "robot brains".

In August, NVIDIA launched the Jetson AGX Thor, an edge computing platform that can run multiple generative AI models simultaneously on a robot's body. It is widely recognized in the industry as the "new robot brain".

The Jetson AGX Thor (hereinafter referred to as Thor) is a new generation of technology stack for robots and physical devices. This brand - new "robot brain" is based on the Blackwell GPU architecture, with a peak computing power of 2070 FP4 TFLOPS. Its AI performance is up to 7.5 times higher than that of the existing Jetson AGX Orin module, and its energy efficiency is improved by 3.5 times.

The biggest breakthrough of Thor is that it can enable robots that previously relied on cloud processing or multiple chips to achieve real - time perception and decision - making on a single compact module.

Currently, early adopters such as Amazon's warehouse robots and Boston Dynamics have started integrating Thor into their products, aiming to create more intelligent and independent robot products.

On the other side of the world, ByteDance launched its self - developed general robot models GR - 3 and Robix in July and September respectively, demonstrating the ability to perform complex tasks in real - life home scenarios.

At the end of July, ByteDance's Seed team released a demonstration video of a robot equipped with ByteDance's latest robot VLA achievement, GR - 3. In the video, it completed the task of inserting a hanger into a shirt and hanging it up.

Last week, Seed unveiled its latest robot research result - Robix. Together with GR - 3, they form ByteDance's new generation of robot model matrix.

It has only been half a year since Jensen Huang's remarks at GTC 2025. The "synchronization" of the two giants in the robotics track also gives people the impression that Jensen Huang and Zhang Yiming are on the same page this time.

A

Before the emergence of Thor, NVIDIA had already become the leader in the robotics track.

NVIDIA's Jetson platform dominates the high - end robotics and autonomous machine development field. Its ecosystem has more than 2 million developers, and more than 7,000 companies use the previous - generation Orin series products.

In August this year, NVIDIA announced that Thor was officially on sale. Judging from its hardware performance and maturity alone, it is almost in an absolutely leading position among similar products.

Currently, the edge AI (Edge AI, running artificial intelligence locally on the device) products of competitors such as Intel and Qualcomm are relatively backward and cannot achieve the same level of integrated computing in a single module.

More importantly, NVIDIA tightly binds its hardware with the software stack and forms an ecological moat relying on the wide application of the CUDA toolkit.

Thor can directly call NVIDIA's complete Isaac robot software platform, AI model library, and simulation tools to achieve end - to - end deep integration. This includes NVIDIA's latest model for robot scenarios - Isaac GR00T N1, an open - source, pre - trained, and customizable foundation model.

GR00T N1 uses a dual - system architecture inspired by human cognition. One system is the "fast - thinking action model", whose behavior is similar to human reactions and intuitions; the other system is the "slow - thinking model", which can reason about the surrounding environment and received instructions to plan actions.

At GTC 2025, Bernt Børnich, the CEO of 1X, said: "While we were developing our autonomous model, NVIDIA's GR00T N1 significantly improved the robot's reasoning ability and skills. We fully deployed NEO Gamma with very little post - training data."

However, Thor is not perfect.

Firstly, there is the price. The Jetson AGX Thor development kit is priced at $3,499, making it difficult to be a choice for ordinary household products.

In addition, the Thor platform has higher power requirements, which means that the products it is adapted to need a stable power supply, such as self - driving cars, factory robots, and delivery robots.

B

In the "robot brain" track, NVIDIA soon had a Chinese competitor.

At the end of July, ByteDance released its new - generation robot VLA model, GR - 3. In the official demonstration, the robot ByteMini equipped with GR - 3 inserted a hanger into a shirt and hung it up, and also completed high - difficulty "missions" such as picking up household items and placing them in designated positions.

In addition, ByteMini can distinguish items of different sizes and successfully execute the instruction to pick up the "larger plate".

By disassembling the demonstration process and technical reports, it can be found that GR - 3 can understand complex and abstract language, such as "larger plate" and "left chair".

In addition, GR - 3 has strong few - shot adaptation ability. According to foreign media reports, the Seed team uses a hybrid method for training: first, a large amount of image and text data is input into GR - 3, then fine - tuning is carried out through human - machine interaction in a virtual reality environment; finally, it learns and imitates the actions of real - world robots. This training strategy enables GR - 3 to maintain adaptability in complex and unpredictable environments.

It is understood that GR - 3 has a larger parameter scale than the GR00T series and performs better in practical applications.

Chris Paxton, an AI scientist who once worked at Meta, disclosed in a study on VLA in the robotics field that ByteDance's 4 - billion - parameter GR - 3 model seems to perform better than NVIDIA's GR00T, which has about 2 billion parameters. It is reasonable to assume that the "scaling laws" will still be valid once there is sufficient data and computing power.

GR - 3 has given ByteDance a place in the "robot brain" track. However, the VLA model is mainly used at the execution level. Compared with the definition of a "brain", GR - 3 is more like the "neurons" of a robot's limbs.

The latest achievement of ByteDance's Seed team fills in another piece of the "robot brain" puzzle.

Last week, the Seed team released its latest achievement, Robix, which is responsible for tasks such as task planning, reasoning, and natural language interaction in the robot system.

Seed researcher Dong Heng described Robix on his personal homepage as "a unified robot brain that integrates reasoning, planning, and natural interaction. Its performance is better than GPT - 4o and Gemini 2.5 Pro."

However, Robix is not the complete form of the "robot brain".

According to Robix's technical documentation, the "body movement/execution" part of the process is usually completed by a low - level controller model, that is, the corresponding VLA model, such as GR - 3 or a similar controller. In other words, GR - 3 and Robix need to cooperate with each other in the scenario of driving a robot.

Robix's working logic has something in common with NVIDIA's GR00T N1's dual - system model. One system is responsible for reasoning, and the other is responsible for execution.

In the official test, Robix equipped with GR - 3 as a controller completed tasks such as table cleaning, cashier bagging, and beverage screening through the ByteMini robot. When compared with other models horizontally, it only slightly lagged behind Gemini 2.5 Pro in the beverage screening project and scored the highest in other projects.

C

Robix and GR - 3 are not ByteDance's first "show of strength" in the field of robot VLM. Many people regard ByteDance as an Internet content company, but in the robotics field, it is actually a low - key "hidden giant".

In December 2023, the Seed team launched GR - 1, which was also a forward - looking experiment by Seed in the field of robot VLA. As a technology verification product, GR - 1 uses a model of large - scale video generative pre - training followed by seamless fine - tuning with robot data.

Based on the technical accumulation of GR - 1, the Seed team launched GR - 2 in 2024. GR - 2 uses 38 million video segments and more than 50 billion tokens of data as the basis for pre - training, and then fine - tunes the action/video generation with robot trajectories. Its average success rate in more than 100 manipulation task tests is as high as 97.7%.

GR - 3 and Robix are the latest links in ByteDance's research field expansion in the era of large models.

A report in LatePost in 2023 revealed that ByteDance's exploration of robotics began in 2020. At that time, Zhang Yiming showed interest in robots and would participate in discussions on robot projects irregularly.

After two years, ByteDance has quietly mass - produced more than 1,000 robots. These wheeled logistics robots are mainly used to transport packages and parts in warehouses and production lines, focusing on an integrated solution of "warehousing + automatic handling". They can learn independently, plan routes, and move to destinations, serving ByteDance's own Douyin e - commerce warehouses and external customers such as SF Express and BYD.

However, these logistics robots are mainly early - stage technical accumulations. It is not difficult to see from the technological development path of Robix, GR - 3, and ByteMini that ByteDance is committed to leading the field of embodied intelligence.

Recently, a number of robotics - related positions have emerged on ByteDance's recruitment website, and some positions clearly mention "next - generation general robots". All these positions belong to the Seed team, based in Beijing and Shanghai. A report in South China Morning Post in July revealed that the Seed team is expected to have more than 300 people this year.

On the other hand, ByteDance is also actively investing in the robotics industry.

Previously, Unitree Robotics, a leading domestic company in embodied intelligence, completed its Series C financing, and the company's valuation exceeded 10 billion yuan. Among the list of investors in this round of financing, in addition to names from Alibaba and Tencent, the Jinqiu Fund, which has a deep connection with ByteDance, also appeared.

The Jinqiu Fund was founded in 2022 by Yang Jie, the former head of ByteDance's financial investment. Most of the core members of the team come from ByteDance's investment system. Its name "Jinqiu" is derived from Jinqiu Garden in Haidian District, Beijing, where Zhang Yiming and ByteDance started their business.

D

ByteDance is accelerating its layout in the robotics field both inside and outside the company. However, currently, ByteDance's technical accumulation is mainly at the "robot brain" level of the model side; while NVIDIA's latest chip solution seems to complement ByteDance's advantages.

For many years, ByteDance has been one of NVIDIA's most important customers in China, and Jensen Huang is also well aware of the important position of Chinese companies in the embodied intelligence market.

In July this year, he appeared at the opening ceremony of the China International Supply Chain Expo in Beijing and said in his speech: "The next wave of AI will be robotics. In the future, robots will not only be able to reason and execute but also truly understand the physical world."

For Jensen Huang, the Chinese market is irreplaceable on this path.

NVIDIA's official blog shows that several domestic companies have already used Thor, including United Imaging Healthcare, Wanjie Technology, Ubtech Robotics, Galaxy General Technology, Unitree Robotics, Zhongqing Robotics, and Zhiyuan Robotics. However, ByteDance's name has not yet appeared on this long list.

The NVIDIA blog also quoted the words of Wang Xingxing, the CEO of Unitree Robotics: "Jetson Thor has brought a huge leap in computing power, enabling robots to have stronger agility, faster decision - making, and a higher level of autonomy, which is crucial for robots to navigate and interact in the real world."

In January 2025, at the CES Global Consumer Electronics Show, Jensen Huang took the stage with 14 cooperating humanoid robot companies, 6 of which were from China, including Unitree Robotics and XPeng.

On the other hand, ByteDance's Seed team is not only focused on the robot model field.

During the testing of GR - 3 and Robix, ByteDance simultaneously launched ByteMini. Although it seems more like an experimental product for testing purposes, browsing the Seed team's technical reports reveals that the technical specifications of this robot are not low. It has 22 degrees of freedom, and its wrist uses a spherical wrist design, which has high operating ability in narrow spaces/high - dexterity tasks.

It can be said that the emergence of ByteMini already shows ByteDance's ambition to create the next - generation embodied intelligence products. And as NVIDIA has come up with a new - generation robot chip solution, Jensen Huang and Zhang Yiming, who are on the same page, seem likely to continue the cooperative relationship between these two leading companies in the robotics field.

This article is from the WeChat official account “Zimubang” (ID: wujicaijing), author: Li Zhaofeng, published by 36Kr with permission.