Is the dedicated chip for robots a false proposition? SONG Jiqiang of Intel: The market is too small to make a profit at present.
In the view of Song Jiqiang, the vice president of Intel Labs, for embodied AI to truly enter factories and households, it must overcome the "reliability" hurdle. The solution is to equip robots with three systems.
On January 20th, Song Jiqiang, the vice president of Intel Labs and the director of Intel China Research, was interviewed by the media, including Phoenix Network.
"Today's embodied AI robots are like 'gifted children': they perform amazingly under ideal conditions but may be at a loss when encountering unexpected situations," Song Jiqiang described the common challenges currently faced by the industry.
On the screen behind him, a system block diagram of a three - layer architecture was displayed. This is exactly the "triple - system" solution proposed by Intel to address the above - mentioned challenges.
As the wave of large models triggered by ChatGPT gradually penetrates into the physical world, embodied AI has become the next focus of the global technology competition. From Tesla's Optimus to XPeng Motors' Iron, robots are being endowed with unprecedented understanding and decision - making abilities.
However, there is a chasm related to "reliability" between demonstration videos and real - world scenarios. Song Jiqiang pointed out that the accuracy rate of action generation for current robots based on the visual - language model (VLA) is "around 60% - 70%." Problems such as hallucinations, poor environmental adaptability, and weak long - task planning ability remain unsolved.
"If we hope that it can be truly implemented in about three years without major accidents caused by safety issues, we need to establish relevant frameworks as early as possible and build consensus in the industry," Song Jiqiang said.
System Architecture: Equipping Robots with 'Triple Insurance'
According to Song Jiqiang's explanation, a trustworthy embodied AI system should consist of three levels: the Primary System, the Safety System, and the Fallback System.
The Primary System carries the "intelligence" of the robot and is responsible for decision - making, planning, and action generation. The "neuro - symbolic AI" method strongly promoted by Intel is its core, aiming to combine the generalization ability of neural networks with the reliability and interpretability of symbolic logic.
"It uses the generalization ability of neural networks to prevent the robot from being limited to a single scenario and a single solution, and also integrates traditional methods based on symbols, rules, and knowledge," Song Jiqiang explained. This is equivalent to "raising the lower limit of the robot" to ensure that it will not cause catastrophic consequences due to problems such as hallucinations.
However, the real world is full of surprises. Actuator failures, sensor errors, unknown obstacles, and slippery floors are all beyond the cognitive boundaries of the Primary System. Therefore, more fundamental safeguards are needed.
The Safety System is a lightweight and highly reliable monitoring layer that continuously compares the robot's execution status with preset safety rules (such as "do not collide with humans" and "maintain a safe distance when holding sharp objects"). Once a deviation is detected, it will immediately issue an alarm or intervene.
If the Safety System cannot handle the situation, for example, when the robot is about to fall, the Fallback System will be activated. Its goal is not to make the robot "stop suddenly" but to guide it into a reliable degraded state.
"For example, the robot can pull over slowly like a car; if it is about to fall, it can choose an unoccupied area and fall slowly by locking some joints," Song Jiqiang said.
This "PMDF" framework (the main control system, monitoring system, safety decision - making, and fault handling and recovery of embodied AI respectively) has been written into the "White Paper on the Intelligent Safety Sub - system of Embodied Robots" jointly released by Intel and multiple partners. Song Jiqiang revealed that there has been a good response after its release, and many academic and industrial units hope to participate in the promotion.
Special - Purpose Chips Not Yet Available, Intel Bets on 'Traditional Advantages'
When the topic turned to hardware, Phoenix Network Technology asked a question: Will special - purpose chips appear in the future robot field? Facing the trend of self - developed chips by car manufacturers such as Tesla and XPeng, where are Intel's opportunities?
Song Jiqiang's answer was honest and practical. He clearly judged that the current scale of the robot market is still small, and special - purpose chips are not economically viable. "The core reason is that the scale of the robot market is still small. For chip manufacturers, it is difficult to make a profit by customizing chips specifically for robots."
Currently, the industry generally reuses mature chips from fields such as mobile phones, cars, and PCs and modifies them for adaptation. A deeper reason is that the "workload" of robots has not yet been finalized. "We cannot determine whether the chip should be optimized for the workload of VLA or support the workload of future world models."
In this situation, general - purpose chips are a safer choice. Song Jiqiang predicted that special - purpose chips (ASIC) will only appear after the industry forms standardized workloads, and their R & D cycle may be 10 - 18 months.
So, where are Intel's opportunities? Song Jiqiang pointed to Intel's long - neglected "hidden champion" position in the industrial control field.
"In the traditional industrial automation field, Intel's market position can be described as having an 'absolute advantage'... In the field of high - precision and high - frequency motion control in industrial scenarios, most industrial control products and industrial control boards are developed based on Intel's CPUs."
He summarized three major advantages: first, technology transfer, migrating industrial motion control experience to the motion control layer of robots; second, resource scheduling optimization to ensure that millisecond - level tasks such as motion control are not interfered with by other tasks; third, multi - system integration ability to achieve isolated monitoring and rapid safety response.
Regarding popular chips with integrated AI computing power such as Core Ultra, Song Jiqiang regarded them as a "stable hardware foundation." If the computing power is insufficient, additional AI computing power cards can be configured. He predicted that the future mainstream deployment model will be "robot terminal + edge server." Under the premise of low latency, large models will be deployed at the edge to form a heterogeneous computing resource pool across the network.
Real - World Bottlenecks: Data Silos, VLA Hallucinations, and Cost Cliffs
Although the blueprint is clear, the road to reliable embodied AI is full of thorns. When answering multiple questions, Song Jiqiang outlined the main bottlenecks currently faced.
First and foremost is the ceiling of the VLA (visual - language model)'s capabilities. Song Jiqiang said bluntly that the current accuracy rate of VLA is only 60% - 70%. There are significant hallucination problems, and it is sensitive to visual environment changes and has weak generalization ability. "It does not really understand the essence of the scene and lacks the cognitive ability of the three - dimensional and causal relationships of objects in the scene."
This is also the reason why the industry has turned its attention to the "world model" - to supplement it with knowledge of physical laws and causal relationships. However, the world model itself also faces challenges in integrating with real - world scenarios.
A deeper and more fundamental challenge comes from data. Song Jiqiang pointed out that data problems are the core pain point of the industry. Embodied AI requires three types of data: scene understanding, task planning, and robot body data. But the current situation is that "data silos" are serious.
"The data required for different industry scenarios, different robot bodies, and different task types vary greatly." He listed four difficulties in establishing unified data standards: unclear definition of data integrity (whether tactile data is needed or not); no unified requirements for operation accuracy and frequency; no recognized optimal solution for robot bodies; and undetermined data collection perspectives.
"Therefore, the industry is still in the stage of independent exploration and will maintain a 'diverse' state in the short term."
The last hurdle is mass production and cost. Song Jiqiang reminded that most of the robots at current exhibitions are "hand - made prototypes," and their components do not meet automotive or industrial - grade standards, resulting in poor consistency. "The overall price reduction of robots also depends on the entry of large manufacturers."
Taking Tesla as an example, he pointed out that one of the core reasons why the industry is optimistic about it is its strong mass - production ability. Only by reducing hardware costs through industrial mass production and meeting the requirements of intelligent capabilities can robots enter a broader commercial and even consumer market.
Next Three Years: From 'Show - Off Genius' to 'Reliable Craftsman'
Facing so many challenges, what is the actual implementation schedule for embodied AI? Song Jiqiang gave a cautious prediction.
"It is estimated that it will take another two or three years to integrate these capabilities into a reliable solution and improve the accuracy rate of VLA from the current 60% - 70% to over 99% required for industrial - grade applications."
He depicted a clear implementation path:
In the short term (1 - 2 years), small - scale deployment will be achieved in semi - structured scenarios such as logistics sorting, factory handling, and standard part assembly. These scenarios have high labor costs and relatively controllable environments, which can tolerate the high initial costs of robots.
In the medium term (about 3 years), as the reliability of intelligent capabilities improves and the industry reaches a consensus on the safety framework, the application scale will be expanded in the above scenarios.
In the long term, it depends on breakthroughs in mass - production consistency and cost control, which requires the participation of large automobile manufacturers and other enterprises with industrial production capabilities.
"This development path conforms to the law of the Gartner growth curve," Song Jiqiang summarized. First, attract investment with technological expectations and rapidly improve capabilities; then solve problems during deployment and verify commercialization in early scenarios; finally, large manufacturers enter the market to promote large - scale development.
At the end of the interview, Song Jiqiang repeatedly emphasized the seemingly contradictory keywords of "integration" and "decoupling."
Integration refers to the integration of new and old technologies - combining cutting - edge AI models with proven traditional control technologies and safety engineering. Decoupling refers to the decoupling of software and hardware at the capability level - enabling the upper - layer perception and planning modules to adapt to different robot bodies and reducing development costs.
"The development of embodied AI does not depend on a single technological breakthrough but requires the superposition and integration of new and old technologies," Song Jiqiang said. An unproven new technology cannot be directly used for critical tasks. Only by combining with mature technologies can a complete and reliable solution be formed.
This may be Intel's unique positioning in this embodied AI competition: not to be the most radical disruptor but to be the most reliable integrator. Using its decades - long "tacit knowledge" in the industrial field, it equips the running AI "gifted teenagers" with a "cerebellum" and "reflex nerves" tempered by the physical world.
When robots leave the spotlight - filled display stands and enter the noisy, chaotic, and uncertain real world, what determines their value will no longer be their most amazing moments but their lowest error - free limits. And this is just the beginning of a long - term project about "reliability."
This article is from the WeChat official account "Phoenix Network Technology", author: Yu Hao, editor: Dong Yuqing. Republished by 36Kr with permission.