IDEA Institute reaches an embodied intelligence cooperation with Tencent, Meituan and BYD | Frontline
Author|Huang Nan
Editor|Yuan Silai
For embodied intelligence, developing environmental perception and understanding is a core ability. Visual perception, as the basis for the interaction between machines and the physical world, has combined with the Transformer architecture since its emergence. It has embarked on a path of "from small to large, from N to 1", bringing an improvement in the full-scene visual perception ability and expanding the space for the implementation of the technology.
On November 22, the 2024 IDEA Conference was held in Shenzhen. At the conference, the IDEA Research Institute released the latest DINO-X general-purpose visual large model. It has the ability to understand objects at the object level and can achieve target detection in the open world (Open-world) without user prompts. At the same time, it also launched an industry platform architecture. Through the combination of the large model base and the general recognition technology, the model can learn while using without the need for retraining to support a variety of application requirements.
In the new round of implementation wave represented by embodied intelligence, the technical path emphasizes more generalization and the pursuit of adaptation to real scenarios. At the conference, the IDEA Research Institute announced the achievement of three major cooperations: with Tencent, the Futian Laboratory will be established in Futian District, Shenzhen and the河套 Shenzhen-Hong Kong Science and Technology Innovation Cooperation Zone, focusing on the embodied intelligence technology of the human settlement environment; with Meituan, to explore the visual intelligence technology of unmanned aircraft; with BYD, to expand the intelligent application of industrialized robots.
Shen Xiangyang, the founding director of the IDEA Research Institute and a foreign member of the National Academy of Engineering of the United States
Previously, robots have entered scenarios such as factory manufacturing, automotive production workshops, and logistics and warehousing terminals, and can complete basic tasks in semi-structured fields, but they still lack the cognitive ability of real scenes, and the application space is limited; for example, the logistics distribution of ground vehicles in residential areas requires overcoming complex ground environments.
Zhang Lei, the head of the Computer Vision and Robotics Research Center of the IDEA Research Institute, pointed out, "Robots have different forms, including dual-arm robots and mobile robots. If mobile robots are divided into indoor and outdoor, the outdoor ones are more like driverless vehicles, and they need to face structured and semi-structured road environments. Highways are more structured, and when entering the city and then to the alleys, the problems they face are more complex."
The arrival of AI large models has greatly improved the cognitive and decision-making capabilities of robots. Han Lei, the head of the Intelligent Agent Center of the Tencent Robotics X Laboratory, said, "Language is a highly abstract and symbolic language of human knowledge or thinking, which can conduct long-term and slow high-dimensional thinking. And the robot is an Agent that views the world from a first-person perspective, so the first thing is to understand the world from a visual perspective."
Roundtable: "From Vision to Action: Challenges and Opportunities of Embodied Intelligence"
When the robot is moving, the folding of a paper box or the movement in a certain direction is often difficult to describe in simple language. With the addition of multimodality, the embodied intelligence that integrates the cognition of the physical world can effectively enhance the robot's understanding of the world.
In terms of implementation, Mao Yinian, the vice president of Meituan and the head of the Unmanned Aerial Vehicle Business Department, believes that the primary application scenarios of robots should take high-risk tasks of humans as the entry point, such as mountain patrols, deep-sea inspections, oil field drilling, and high-rise building cleaning, etc. "With whole-body control, action control, hand, and visual-tactile coordination, starting from small scenarios and using them, users will not say it is good, nor will they scold it, and will not kick it out. We hope to see that it does not fail, and this is very important."
At the IDEA Conference, Shen Xiangyang, the founding director of the IDEA Research Institute and a foreign member of the National Academy of Engineering of the United States, pointed out that in the period of a technological explosion, it is particularly important to have a deep understanding of the technology for innovation. And "Shenzhen is a city that iterates hardware at the speed of iterating software."
In addition to the Futian Laboratory mentioned above, IDEA and the Qianhai Shenzhen-Hong Kong Cooperation Zone jointly build the IDEA Qianhai Innovation Institute, cooperates with Longgang District, Shenzhen to build the IDEA Low-altitude Economy Branch, and jointly builds the IDEA - Hengqin Digital Technology and Artificial Intelligence Evaluation Center with the Guangdong-Macao In-Depth Cooperation Zone in Hengqin. In terms of the start-up ecosystem, IDEA has also incubated ecological enterprises such as Shizhiyuan Technology, AI Companion Robot Ai Xiaoban, and GPU-accelerated film industry renderer Smaray Hui Guangzhui.