Professor Xiao Yanghua: How far is embodied intelligence from the "emergence"?
The new wave of technologies represented by generative AI is advancing at an astonishing pace, bringing about profound technological, commercial, and social changes, and propelling human society from the information age to the intelligent age. While the world eagerly anticipates the arrival of AI, there is also great concern about the new opportunities and challenges that artificial intelligence will bring.
To address these concerns, we have launched a seminar titled "100 Questions from 100 Experts on AI & Society." We have widely invited AI technology experts, founders of AI unicorn companies, AI investors, as well as sociologists, psychologists, international relations experts, and science fiction writers to engage in in - depth discussions from diverse perspectives. The aim is to explore the extensive impacts of artificial intelligence technology, identify both the consensuses and non - consensuses in the AI era, and jointly promote the sustainable development of artificial intelligence in a direction that "assists human development and is kind to humans."
In this issue, we are honored to invite Professor Xiao Yanghua to embark on a thought - provoking journey into the world of AI with us.
Professor and doctoral supervisor at the School of Computing and Intelligence Innovation, Fudan University; AI scientist at the Shanghai Institute for Science and Intelligence; Director of the Shanghai Key Laboratory of Data Science. He has long been engaged in research on big data and cognitive intelligence. He has won the 10 - Year Impact Paper Award at ICDE 2024 and the Outstanding Paper Award at ACL 2023. He has published more than 300 papers in CCF - A and B - level journals. He has published three academic monographs and textbooks. He has received research awards from institutions such as Huawei, Alibaba, and Meituan. He serves as the associate editor or editorial board member of several international journals such as Applied Intelligence.
Key Points:
1. In the past few years, the development of artificial intelligence has shown two very clear trajectories. One is represented by AIGC, or generative artificial intelligence. The other is the development direction characterized by embodied intelligence. To understand why we say that AGI is a technological revolution, we need to look at three fundamental aspects: First, whether this advanced technology is fundamental. Second, its impact on productivity improvement. Third, its influence on the entire social superstructure.
2. There is a next stage for embodied intelligence, which is the synergy between the body and the mind. In fact, this issue has been pondered at the philosophical level for a long time - how does human - level intelligence emerge? Imagine that when you feel physically comfortable, you feel happy. So, the body has a certain shaping effect on the brain. When we exercise, for example, when running to a certain extent, the brain will secrete dopamine. The body has an impact on the brain, and conversely, the brain also indirectly affects the body. Therefore, the body and the brain shape each other in a two - way manner.
3. At the current stage of AI, the quality of data and the strategy for training through trial - and - error have become the two most crucial factors. That is to say, the scale of data gives way to the quality of data, and the scale of computing power gives way to the design of algorithms.
4. There has always been a basic view in the industry that the model algorithm or architecture determines the lower limit of the model, while data determines the upper limit. From an industry perspective, the main responsibility of large clients such as central and state - owned enterprises is to organize and clean their industry data well. This is the key to the development of industry - specific AI.
5. The core of the consumer - oriented (ToC) applications of embodied intelligence is emotional intelligence. If robots are to enter thousands of households in the future, they must be able to empathize with us and understand our emotional needs to truly play a role in ToC applications.
6. To some extent, the data we collect today is still far from the critical point required for the emergence of generalization ability in embodied intelligence, with a difference of perhaps more than two or three orders of magnitude compared to large language models. There are two ways to promote embodied intelligence to approach the critical point as soon as possible: One is to increase the training volume when the data volume is insufficient. The other is to understand the human generalization mechanism, including the ability to draw inferences from one instance and the ability of induction and deduction.
7. In terms of the ideological origin of the development of artificial intelligence, we still have not deviated from the framework of the three routes mapped out by scientists in the 1950s and 1960s - symbolism, connectionism, and behaviorism. These three paradigms remain the most core ideas for constructing a complete artificial intelligence solution today.
8. In the future, embodied intelligent robots will definitely follow a scenario - based and task - based path. They can be scaled up, but the intensification should be appropriate. We need to consider whether the physical structure of the robot is feasible, rather than implanting more capabilities, which goes against the industrial logic.
9. The human body is both an empowerment and a constraint. To some extent, human physical abilities limit our physical boundaries. A person cannot run to the moon on two legs, which is precisely the limitation of the human body on itself. In the future, we should think in reverse. To prevent AI from harming humans, we should equip AI with a body. By limiting the physical functions of the body, we can ensure human safety.
10. While AI with a body can indeed eliminate humans physically, it is more important to note that AI with high intelligence poses greater risks and damage to humans. How can we defend against the risks brought by AI? First, we need to conduct risk assessments and develop the profession of "AI risk supervisors" who can "pull the plug." Second, we need to strengthen the alignment of AI. First, we need to solve the problem of aligning the values of human society.
11. How can we prevent human degradation in the intelligent age? I think we need to do several things: First, we need to establish basic guidelines for the application of artificial intelligence. Second, we need to vigorously develop education and psychology. Third, we need to explore outward, expand our cognitive boundaries, and establish a new value system.
In future educational reforms, we cannot completely abandon our current core skills in pursuit of high - level abilities in the future. Future work will no longer be just a means of making a living but an experience that you enjoy.
Full Transcript
100 Questions from 100 Experts on AI & Society:
In the past three years, AIGC has driven a wave of practical applications of AI technology. However, some in the industry believe that embodied intelligence is the key direction for AI development. These two technological routes are currently advancing side by side. You have in - depth research in the fields of large models and big data. From your perspective, could you outline the revolutionary characteristics that these two technologies share?
Xiao Yanghua:
In the past few years, the development of artificial intelligence has shown two very clear trajectories. One is represented by AIGC, or generative artificial intelligence. The other is represented by embodied intelligence.
Large models like ChatGPT essentially aim to enable machines to have the cognitive abilities of the human brain. Simply put, they aim to make machines think like humans. In fact, as early as the 1950s, Turing discussed the profound question of whether machines can think like humans in his groundbreaking paper (Can Machine Think?). Scientists have been thinking about this question since the very beginning of computer design. The progress of generative large models today is essentially an answer to this question. To some extent, we can say that current large language models have learned the ability to generate human language and the logical thinking ability behind language, and even possess many cognitive abilities of the human brain.
The other route is embodied intelligence, whose fundamental purpose is to enable machines to acquire the perception and action abilities of the human body. In addition to the cognitive functions presented by the brain, human intelligence is also reflected in the ability of the five senses of the human body to perceive the world. For example, we use our eyes to see, our ears to listen, and our skin to touch. And with the support of these perception abilities, we can interact with the complex world efficiently and smoothly. These perception and action abilities are mainly determined and endowed by our bodies. Therefore, the fundamental purpose of embodied intelligence is to enable machines to imitate the human body's ability to perceive and interact with the world.
These two technological routes are two key intelligent forms that machine intelligence must go through on its way to AGI. In fact, the development of cognitive intelligence and embodied intelligence may have another important milestone (or stage) in the future, which is the synergy between the body and the mind.
Currently, machines have a "brain" and a "body," but there is still a fundamental gap between their ability to coordinate the body and mind and human intelligence. In fact, philosophers have long pondered a profound question - how does human - level intelligence emerge? There were several opposing schools at that time. For example, some believed that our brain determines our intelligence, but later it was found that the body has an indispensable shaping effect on the brain. Imagine that when you feel physically comfortable, you feel happy; when you feel physically uncomfortable, your mood is usually not good. When we run to a certain extent, the brain will secrete dopamine, making us excited and happy. Your physical abilities also determine your range of actions, which fundamentally determines the cognitive boundaries of your brain. The cognitive model of the human brain is largely a metaphor for physical abilities. For example, we often say that one should be able to "take on and let go" in life, which is essentially a metaphor for the ability of the arms. So, the body has a certain influence and even a shaping effect on the functions of the brain. Conversely, the brain also affects the body. For example, the brain is constantly controlling the body's interaction with the environment. Therefore, the body and the brain shape each other in a two - way manner.
Now, it seems crucial whether machines can overcome the challenge of body - mind coordination. Currently, machines still have various problems in body - mind coordination. Individually, the physical capabilities of current robots are becoming more and more advanced, and the perception and interaction abilities that machines can achieve are becoming increasingly powerful. Large models are also becoming stronger, and the cognitive abilities that machines can achieve are developing rapidly. However, when these two are combined, we will find that robots may perform very "stupid" actions, which is essentially due to the lack of body - mind coordination ability in machines.
So, do the above technological paths constitute a technological revolution? In the long run, this involves a fundamental question: Is it the brain or the body, or rather, the cognitive ability of the brain that is decisive, or the perception, interaction, and action abilities presented by the body that have a more decisive, revolutionary, and long - lasting impact on industrial and social development?
We can clearly see that when AI has the cognitive abilities of humans, that is, when machines have a "brain," it is definitely a technological revolution. Why do we say this is a technological revolution? We need to look at three basic aspects:
First, whether this advanced technology is fundamental. Traditional technological revolutions, such as those related to steam and electricity, have become infrastructure and are almost ubiquitous. So, the fundamentality criterion is met.
Second, its impact on productivity improvement. As an advanced technological revolution, it can exponentially increase productivity. Currently, with the support of AIGC, the efficiency of many tasks, especially mental work such as contract review, painting, and text generation, has increased by hundreds or thousands of times. After the emergence of AIGC, the production of various papers has increased significantly, to the extent that people are inundated with papers. This is indeed a significant improvement in paper - writing productivity. Without AIGC, many students would struggle to write well, but today they no longer have this problem. So, this is evidence of productivity, and AIGC is indeed a productive force.
Third, its influence on the entire social superstructure. When AI can think and has the abilities of the human brain, its influence on society is unprecedented in terms of breadth and depth. All human production and life activities that involve intelligence will be affected by generative artificial intelligence. For example, when listening to a report or attending a meeting, which requires the use of the brain, AI products can be developed for meeting transcription. So, AI with a "brain" will penetrate every aspect of society and be ubiquitous. Its widespread application may also lead to human mental laziness, so its impact on society is also very profound. There are already many discussions about whether people's brains will become useless if they rely too much on AIGC. These are essentially manifestations of the influence of AI with a "brain."
Therefore, according to the above three criteria, AIGC or generative large models are definitely a new technological revolution. However, in the case of embodied intelligence, if its purpose is only to enable machines to have the perception and action abilities of the human body or animal body, the impact of having physical abilities on productivity improvement may be less than that of a large - scale population reproduction or a population - growth incentive policy.
Assume that the current global population is 8 billion, and in the future, there will be 8 billion robots working for us. To some extent, the production capacity of these 8 billion robots can be achieved by doubling the human population. Considering the R & D and maintenance of machines, the production capacity created by a humanoid robot can be equivalent to that of one or two human individuals, which is equivalent to a one - to - two - fold increase in the human population. So, in this sense, while having a body can promote productivity improvement to a certain extent, this promotion is a constant. Compared with enabling machines to have human - level thinking abilities, the liberation of productivity is at a different level. Especially for humanoid robots in embodied intelligence, their commercial scenarios still need to be explored. Which scenarios really require the use of humanoid robots? People may think that there are many tasks that require the human body, such as accompanying the elderly. However, once humanoid robots are mature enough to enter families and daily life, their development will be affected by two factors that cannot be ignored: safety and ethics.
From a safety perspective, if a robot is serving you, it may fall and accidentally injure humans. For safety reasons, you will also limit its application. From an ethical perspective, there are more factors to consider. For example, in the future, the technology may be very mature, and robots may be very cheap. However, no matter how carefully a robot takes care of an elderly person, it cannot replace the greetings and companionship from children.
In this sense, safety and ethics will limit the application scenarios and scale of humanoid robots to a certain extent.
Therefore, in terms of the influence on society and life, enabling machines to have a body has a much more limited impact than enabling them to have a brain. The development of embodied intelligence we see today is largely a natural extension after AI has made great progress and breakthroughs in simulating the human brain. So, I tend to think that the current development of embodied intelligence is still a technological extension after the breakthrough of cognitive intelligence. Even if a breakthrough is made in embodied intelligence at the human - body level, its impact on the industry and society is not as fundamental and revolutionary as enabling machines to have human - level cognitive abilities. For humans, a body without a brain is at best a walking corpse. The same is true for machines. Only with the support of cognitive abilities can embodied abilities have revolutionary significance, industrial value, and form real new - quality productive forces.
100 Questions from 100 Experts on AI & Society:
Thank you, Professor Xiao. You have provided a relatively complete summary of the revolutionary characteristics of AI. That is, whether it has the fundamentality like water, electricity, and gas, can exponentially increase our productivity, and has a disruptive impact on social, economic, and daily life. These should be the core characteristics for you to judge the revolutionary nature of AI. What you said makes me think of digitization or the Internet, which also possess these three characteristics. So, the next question is, after the transition from digitization to intelligence, or after the foundation of digitization is laid, we have witnessed the emergence of generative artificial intelligence. In its evolution path, what are the underlying laws? Can these laws help robots become more general - purpose? Many people believe that embodied intelligence has the potential to become a general - purpose artificial intelligence. Could you abstract and summarize these laws and clues for us?
Xiao Yanghua:
Looking back at the development of generative artificial intelligence in the past two years, we can find that there are indeed certain patterns or laws. The most typical pattern is the so - called scaling law, which means that massive data and large - scale computing power play a decisive role in stimulating the capabilities of AI. Large - model manufacturers generally still follow the scaling law, and their technological routes are similar. To further enhance the capabilities of their large models or embodied intelligence, the core work of major manufacturers is to collect more high - quality data, purchase or rent more powerful computing power, and use more powerful computing power and more data to stimulate the potential of large models. The development model based on the scaling law has led to a series of progress in generative artificial intelligence. To some extent, the scaling law has also been extended to the development of embodied intelligence and other forms of intelligence. We can see that practitioners of embodied intelligence are trying their best to collect, synthesize, and generate larger - scale and higher - quality embodied data, which is essentially because they believe that the scaling law is still effective for the development of embodied intelligence.
However, when we review and reflect on the development path of generative artificial intelligence at present, we must pay attention to the new development model of large models represented by the rise of strong - thinking large models such as Deepseek. The successful