How much computing power is needed for humanoid robots?
As a die-hard fan of humanoid robots, Jensen Huang has dropped another bombshell in the past couple of days:
The edge computing power of Jetson T5000, with a computing power of up to 2070 TFLOPS, is specially designed for humanoid robots.
So far, Jensen Huang, the chip maker, has elevated the edge computing power of humanoid robots to a new height.
A height where, without relying on cloud computing, more AI inference calculations and real-time processing of multimodal sensor data can be performed locally.
This means that the latest research and development results on the model side and sensors have the possibility of being implemented more quickly on humanoid robots.
This is what we can see the benefits that over 2000T of edge computing power can bring to the humanoid robot industry.
Of course, this is more like a mathematical derivation formula, which is theoretically feasible.
So, in reality, how much computing power do current humanoid robots need?
01 Everyone Loves Robots
As the ultimate form of almost all technological products, robots, especially humanoid robots, have a unique position in the technology field and have fascinated many tech tycoons.
Among them, the admiration of two tech tycoons for humanoid robots has directly propelled humanoid robots, which were once on the fringes, to the top of the current technology field.
One is the tech madman Elon Musk, the "creator" who successfully launched a commercial launch vehicle, developed satellite Internet, and researched self-driving cars, and is also working on brain-computer interfaces.
After officially announcing his entry into the humanoid robot field at the first AI Day in 2021, he created a prototype of a humanoid robot in just one year.
Since Elon Musk has created so many miracles and accomplished things that most people wouldn't even dare to think about, when he officially announced that he would build humanoid robots, he brought the concept of humanoid robots, which had gone through many setbacks, back to the center of the historical stage.
The other is Jensen Huang, the founder of NVIDIA, the world's first company with a market value of $4 trillion.
Jensen Huang is a tough-minded manager and also an entrepreneur extremely sensitive to technology. In NVIDIA's corporate development history, Jensen Huang believed in two important technological trends at their budding stages:
One is artificial intelligence. In 2014, when deep learning was not yet a trend, Jensen Huang noticed artificial intelligence and believed that it was the future.
So, NVIDIA's designed GPUs became a powerful tool for those old American professors to train neural networks in the following years. Jensen Huang even directly declared at GTC 2015, "We are not a hardware company. We are an AI company."
Artificial intelligence was thus written into NVIDIA's corporate strategy.
The other is robots. NVIDIA's Jetson series of computing platforms, developed for the robot field, predates this wave of humanoid robot development.
The first-generation Jetson series of computing platforms, the Jetson TK1, was released in 2014. As the first product of the Jetson series of computing platforms, the release of the Jetson TK1 marked the starting point of NVIDIA's strategic transformation towards the fields of embedded AI and robots.
In the following ten years, we have seen that NVIDIA's Jetson series of platforms have continuously evolved, from the Jetson TK1 with a computing power of less than 1 TFLOPS at the beginning to the current Jetson AGX Thor with a computing power of 2070 FP4 TFLOPS.
In the process, Xavier, Orin, and Thor have become three generations of computing platforms that have left a heavy mark in NVIDIA's robot industry.
Taking the Jetson AGX Xavier as an example, JD.com and Meituan once built their own logistics and distribution robots based on this product. Mainstream industrial robot manufacturers such as Fanuc also use this product to build industrial robotic arms.
Subsequently, the Orin series of products with a computing power of 100 TFLOPS have become the AI computing power platforms behind the star products of domestic humanoid robot companies such as Zhipu and Unitree.
In a sense, if Elon Musk has shown the world the commercial value of humanoid robots, then Jensen Huang has gradually improved the performance of the robot computing power platform, providing humanoid robots with an increasingly powerful edge computing power platform.
However, even Jensen Huang, who is so fond of robots, still thinks that the concept of robots is not sexy enough, at least not unique enough at present. So, Jensen Huang created a new concept - Physical AI.
Compared with NVIDIA's absolute dominance in the virtual world, Physical AI also reflects Jensen Huang's greater ambition.
02 Both High Computing Power and Small Models
How much edge computing power do humanoid robots need?
This is a question I always bring up when chatting with experts in the industry in the past few months. Of course, it is also a question that is destined to have no unified answer.
From the current edge computing power carried by humanoid robots on the market, most of them are around 100 - 200T.
This is not because 100T of computing power is the top configuration for humanoid robots, but because such computing power is already more than enough for current humanoid robots.
Regarding the skills of humanoid robots, there is currently a consensus, that is, humanoid robots still stay at performing simple actions such as grasping and sorting, and are constantly breaking through in the execution of long-range tasks with the support of embodied models.
For training and executing such tasks, 100T of computing power for AI inference is basically sufficient.
If more complex multi-sensor data processing and fusion calculations, as well as the operation of end-to-end models with larger parameter scales are involved, 100T of computing power will seem a bit stretched, and the previous method was to use cloud computing power.
Of course, there is another idea, an idea that is bound to be the mainstream technological path in the future - making the edge models smaller.
Not long ago, Boston Dynamics, the internet celebrity in the robot world, released a video of Atlas's recent situation online. Atlas, which uses an end-to-end LBM model, can already perform tasks such as grasping, sorting, and folding well under various interference conditions.
According to the official disclosure of Boston Dynamics, the LBM model used this time is a model with a scale of 450 million parameters based on the Transformer architecture. Combined with the flow matching target, it can convert input information including 30Hz images, human body sensations, and language instructions into action instructions to control Atlas's movements.
Compared with large models with billions or tens of billions of parameters, a model with a scale of 450 million parameters can only be regarded as a small model. The reduction in computational load brought by such a small model allows humanoid robots to have more computing power for real-time data collection and processing.
In fact, not only Boston Dynamics, but also NVIDIA, which is constantly increasing the upper limit of the computing power of its computing platforms, is actively advocating the path of small edge models.
NVIDIA researchers pointed out in a recently published paper titled "Small Models Are the Future of Agents" that small models can perform agent tasks more efficiently by optimizing hardware resources and agent planning design.
When most agents perform tasks, they need large models for tool invocation, task decomposition, process control, reasoning and planning, etc. However, when executing tasks, they often do not need large models to perform simple and repetitive tasks, but rather select appropriate tools for each subtask.
NVIDIA researchers believe that instead of using general large models to handle these tasks, it is better to use multiple small models that have been professionally fine-tuned to perform each specific task.
This method is naturally also applicable to the field of humanoid robots, which currently have unique requirements for computing power.
This idea sounds very "Boston Dynamics", as if it has returned to the "fixed pattern" of programming, but it is a "programmed fixed pattern" under the paradigm of large models.
In the future, with the continuous optimization of inference scheduling and the modular development of large model inference systems, this paradigm can also be regarded as an essential leading path for the industrialization of humanoid robots in the next decade.
This article is from the WeChat official account "Zinc Industry". Author: Shanzu. Republished by 36Kr with permission.