Behind the viral popularity of the Spring Festival Gala robots: An industrial speculation about "non-consensus"
From the "atmosphere group" to the "true top - tier star", embodied intelligence has completed a magnificent transformation from the stage background to the protagonist of the era through a 14 - year technological advancement, becoming the best representative of China's new technological confidence.
Text | Meng Wen
In 1950, Turing planted the seed of "embodied intelligence" in "Computing Machinery and Intelligence".
Seventy years later, this seed sprouted with the upsurge of ChatGPT and the emergence of VLA. Embodied intelligence replaced the traditional narrative of "automation" and became the new industry consensus.
As a new species of "the realistic carrier for AI implementation", robots have become the new favorites chosen by the era.
At the just - concluded Spring Festival Gala of the Year of the Horse, a group of embodied intelligence enterprises such as Unitree Technology, Songyan Power, and Galaxy General made a collective appearance, completing a national popular science with a high density. It is reported that two hours after the start of the Spring Festival Gala of the Year of the Horse, the search volume of JD robots increased by more than 300% month - on - month, and the order volume soared by 150%.
However, this is not just a tour full of applause, but also a cruel shift accompanied by intense competition.
In the capital market, this is an unprecedented carnival: the annual financing scale of the embodied intelligence track has soared to 744 cases, with the amount reaching 73.543 billion yuan. However, behind the prosperity, the industry is also experiencing pain.
On the one hand, giants such as Tesla and Ubtech are accelerating iteration and expansion on a global scale; on the other hand, there is the cold reality that the star startup K - Scale regrettably withdrew from the market and the once unicorn, CloudMinds Robotics, quietly collapsed.
The soaring valuation and the restraint in shipment volume constitute the most real tension of embodied intelligence.
1
From the top - tier star at the Spring Festival Gala to the workshop co - worker
At the Spring Festival Gala of the Year of the Horse, embodied intelligence dominated the visual center in an unprecedented way.
Unitree Technology's G1 robot set the whole venue on fire with "Wu BOT", showing amazing sports limits in single - leg continuous backflips and horse - vaulting jumps of two or three meters in height; Songyan Power's "Bionic Cai Ming" achieved an almost indistinguishable replication of makeup and lip - sync through pixel - level reproduction; Magic Atom's MagicBot Z1 transformed into a dance troupe and completed high - difficulty actions such as Thomas spins on stage with celebrities.
From the hundred synchronized dancing panda robot dogs at the Yibin, Sichuan branch venue to the scenario - based demonstrations of Galaxy General and Dreame, the high concentration of robots was jokingly called the "First AI Spring Festival Gala" by netizens.
Recalling 14 years ago, when robots first appeared at the Spring Festival Gala, they were just background dancers in the "atmosphere group" that could only perform simple actions. Now, they not only firmly stand in the C - position of the stage but also have become the well - deserved top - tier stars of the Spring Festival Gala with their deeply evolved perception and interaction abilities.
A more profound transformation is taking place in the factory workshops behind the spotlight.
At the beginning of 2026, Zhipu Robotics announced that its cumulative offline production had exceeded 5,000 units and was sprinting towards the annual target of tens of thousands of units. Its "Expedition" series has accumulated more than 1 million working hours on automobile manufacturing and precision electronics production lines;
Ubtech proposed a production capacity plan of 10,000 industrial - grade robots and signed a strategic agreement with Airbus. The Walker S2 officially entered the manufacturing factory and began to challenge aviation - grade precision assembly;
Xingdong Jiyuan joined hands with SF Technology to promote large - scale implementation in the high - frequency warehousing and transfer links, transforming the advantages of "legged + wheeled" into logistics efficiency.
The industrial enthusiasm has quickly spilled over into the capital market. Gu Shitao, the co - founder of Magic Atom, revealed that the company may have new news in the secondary market as early as 2026 and is arranging the listing schedule at the fastest speed. Leju Intelligence and Deep Robotics, which have completed the shareholding reform, have also officially launched the listing process.
After Internet giants such as Meituan, Alibaba, JD, and Tencent frantically deployed large - scale models in 2024, they also collectively "rushed into" the embodied intelligence track in 2025. Advanced manufacturing and industrial giants represented by CATL and automobile OEMs have also placed their bets...
From laboratory demos to factory orders, from capital narratives to commercial realization, embodied intelligence seems to have crossed the critical line of technology verification and is speeding towards the eve of large - scale mass production.
Image source: WeChat official account of Galaxy General Robotics
The policy support has also shifted from macro - guidance to precise intervention. At the end of 2025, the "Implementation Plan for the Digital Transformation of the Automobile Industry" jointly issued by four departments including the Ministry of Industry and Information Technology clearly proposed to promote the large - scale application of intelligent robots in welding, painting, assembly and other processes and to build an "embodied intelligence demonstration production line".
However, there is a deep gulf between the ideal and the reality. Jiang Lei, the chief scientist of the National - Local Joint Innovation Center for Humanoid Robots, frankly said that the industry is currently more like making "consumer - grade product reserves", and the annual production volume dare not exceed 10,000 units because "producing too many has no use, and the after - sales pressure will also be very high".
Wang He, the founder of Galaxy General, even more bluntly pointed out that the number of robots truly operating in human work scenarios around the world may be less than 1,000.
Although Tesla's Optimus V3 is scheduled to be released in Q1 this year, with a grand production capacity target of 100,000 units by the end of the year and 1 million units in the long - term, and a target selling price of $20,000, its schedule has been postponed by about 8 months compared with the original plan.
The mass - production stability of the 22 - degree - of - freedom dexterous hand under extreme working conditions and the engineering problems of liquid - cooling heat dissipation during high - power operations are the core bottlenecks.
The capital carnival and the industrial anxiety are intertwined. This "tear" is not only due to the public opinion breakthrough inspired by the Spring Festival Gala stage show but also because embodied intelligence is full of "non - consensus" in terms of hardware, algorithms, and even commercialization path selection.
2
Breaking through the paradigm
Shifting gears at full speed in the "non - consensus"
So - called embodied intelligence means giving a machine a "body" and a "brain": allowing it to truly perceive the physical world through sensors, then using algorithms such as large - scale models to understand the environment, plan actions, and drive joints and motors to complete tasks. In simple terms, it means making robots "see, understand, and act" like humans.
If it is abstracted as "an AI operating system with a body", the bottom layer is the hardware body, which is responsible for making the machine "move"; the next layer is the algorithmic brain, which determines "how it thinks"; the layer above that is environmental perception, which enables it to "see the world clearly and feel itself"; and finally, there is commercial operation and maintenance, which is concerned with whether the robot can "survive and make money" in the real world.
In the matter of "what kind of body to build", there are currently three routes in the industry.
Ubtech and Zhipu Robotics are committed to defining the robot's skeleton with "industrial precisionism". They pursue long - term stable operations on automobile manufacturing or precision electronics production lines by self - developing the core servo system and precision reducers in a full - stack manner. They exchange physical reliability for the in - depth trust of the industrial scenario in the "silicon - based labor force".
Unitree Technology, Songyan Power, and Zhongqing fully utilize the scale effect of the local supply chain to seek breakthroughs in "performance and cost - effectiveness". They have successfully reduced the overall machine cost from the million - yuan level to the hundred - thousand - yuan level or even tens of thousands of yuan, lowering the threshold and attracting a large number of developers and geeks to first build an ecosystem in non - standard scenarios.
Image source: WeChat official account of Songyan Power. The picture shows the "Little Rascal N2" shaking hands with CES exhibitors.
Galaxy General and Deep Robotics want to prove that "humanoid" is not the only solution for physical operations. The former chooses a wheeled chassis with two arms and focuses on warehousing, retail, and some heavy - load industries; the latter adheres to a combination of quadruped and humanoid and strives to take the lead in scenarios such as power inspection, pipe gallery tunnels, and emergency rescue by adapting to the terrain.
Actually, this difference in routes also corresponds to the divergence of business philosophies - some insist on vertical full - stack, doing everything from servos, motors, reducers to the whole machine, and then to the upper - layer control and large - scale models by themselves to gain long - term barriers and bargaining power, such as Ubtech's Walker S2;
Others choose to open up modules, make the body a standard platform, and open up interfaces to allow more third - parties to "install brains and applications", making money from the shipment volume and the ecosystem, such as the open platform launched by Zhipu Robotics.
Looking further up, the brain algorithm is almost a history of technological paradigm iteration. Although the early simulation migration technology solved the initial model training cost, when facing the friction, deformation, and complex noise in the real physical world, it will fall into the cumulative error of long - sequence operations, resulting in "making more and more mistakes" in reality.
Later, the VLA (Vision - Language - Action) large - scale model that integrates general Internet corpora became the mainstream, endowing robots with excellent semantic understanding and task decomposition abilities. From Google's RT - 2 to Physical Intelligence's π series, and then to GEN - 0, GR00T, etc., the VLA model has greatly reduced the threshold of human - machine interaction.
VLA is good at interweaving complex image and language information and deducing actions according to the learned "patterns". However, its structural shortcomings have also emerged: when dealing with delicate physical operations and force - sense feedback, VLA often has difficulty accurately predicting the consequences, such as "putting a cup on the edge of the table" and "neither letting it slide off nor spilling the water".
Zhao Mingguo, a professor in the Department of Automation at Tsinghua University, believes that the VLA model that the industry is keen on is more of a transitional technological means rather than the ultimate solution. He mentioned that the success of the large - language model stems from the "standardization" and "massiveness" of human language data, but the visual and tactile data in the physical world are "very non - standard" and cannot be simply copied.
Image source: Daxiao Robotics
Recently, the industry's breakthrough point points to the WAM (World Action Model) world model. This new paradigm requires robots to first simulate the physical evolution in the internal imagination space before actions occur.
Recent studies such as Cosmos Policy released by Stanford and NVIDIA have shown the possibility of an embodied model that can generalize and execute different tasks with Zero - shot (zero - sample), that is, training the robot's "physical intuition" through video generation models: first learning "how the world will evolve if a certain situation occurs", and then planning "how I should act" based on this.
This ability of "pre - rehearsing before executing" has become the key to improving the success rate of robot operations. The Ctrl - World model jointly proposed by Tsinghua University and Stanford can increase the instruction - following success rate of downstream tasks from 38.7% to 83.4% using zero real - machine data, with an average improvement of 44.7%
Although the potential of the world model lies in fundamentally alleviating operation errors, the amount of data, computing power scale (NVIDIA's DreamZero relies on a computing cluster composed of top - level chips such as H100 or GB200 for parallel reasoning, and the current computing power cost is completely unacceptable for independent robots deployed at the edge), and engineering complexity required behind it are also far beyond the past, and it is in a stage where "scientific research highlights" and "engineering exploration" coexist.
This difference in technical paths also extends to the choice of "intelligence source": whether to mount general large - scale models such as GPT - 4o and Gemini to "borrow intelligence", or to train an embodied native model from scratch like domestic companies such as Yuanli Lingji, has also become a high - ground for teams with different technical backgrounds to explore.
The "emergence" of intelligence cannot be separated from the feeding of high - quality data, which falls on the environmental perception layer. Chen Yilun, the CEO of Shizhihang, once mentioned that due to the complexity of tasks faced by embodied intelligence, the amount of data required for product - level iteration is more than ten times that of autonomous driving. Wang Qian, the founder of Zibianliang, also reminded