NVIDIA Unveils Embodied Intelligence Inference Model

Nvidia has released the open-source robotic reasoning model Cosmos Reason, which supports the decomposition and execution of complex instructions.

At the SIGGRAPH conference, NVIDIA launched the open - source physical AI and robot vision reasoning model Cosmos Reason, which can decompose complex instructions into tasks and execute them with common sense.

At the industry's top - tier conference SIGGRAPH (the annual conference of the ACM Special Interest Group on Computer Graphics and Interactive Techniques), which was unveiled on Monday local time, NVIDIA, the "world's most valuable stock", launched a series of world models, application libraries, and infrastructure for robot developers.

Among them, the most eye - catching one is the open - source physical AI application and robot vision reasoning model Cosmos Reason with only 7 billion parameters.

NVIDIA introduced that since OpenAI released the CLIP model several years ago, vision - language models have changed computer vision tasks, such as object and pattern recognition. However, previous models were unable to solve multi - step tasks and had difficulty dealing with ambiguous or novel real - world experiences.

With its memory and understanding capabilities, Cosmos Reason enables robots and AI embodied agents to "reason like humans" and take actions in the real world.

In the case provided by NVIDIA, the robotic arm running the vision reasoning model successfully inferred that the most reasonable next action in the "bread + toaster" scenario was to put the bread into the toaster for toasting and translated the thinking logic into operating instructions for the robotic arm.

(Source: NVIDIA)

This function is called "robot planning and reasoning". Cosmos Reason can serve as the "brain" of the robot, responsible for conscious and systematic decision - making. The vision reasoning model can interpret the environment and, when faced with complex instructions, decompose them into tasks and execute them using common sense.

In addition, this model can also be used in a series of AI applications. For example, it can automatically organize and annotate large - scale and diverse training datasets and extract valuable information from massive video data for attribution analysis.

Currently, this model has been put into commercial operation. NVIDIA disclosed that its internal robot and autonomous driving teams are using this model for data organization, filtering, annotation, and post - training of VLA (vision - language - action). Uber is also using this model to annotate and generate descriptions for autonomous driving training data.

In addition, Magna International is using this model to develop the fully automated instant delivery solution City Delivery to help vehicles adapt to new urban environments more quickly. VAST Data and Milestone Systems are also applying this model in fields such as traffic monitoring automation and visual inspection.

Besides Cosmos Reason, NVIDIA also added Cosmos Transfer - 2 to the Cosmos world model to accelerate the generation of synthetic data from scenarios such as 3D simulation, as well as a distilled and more speed - optimized version of Cosmos Transfers.

NVIDIA also updated the Omniverse software development kit on Monday and announced a new neural reconstruction library. This includes a rendering technology library that allows developers to simulate the real world in three dimensions using sensor data.

This series of releases marks that the AI chip giant is stepping up its efforts to enter the robotics field, trying to cultivate it as the next important application scenario outside of AI data centers.

This article is from the WeChat official account "Science and Technology Innovation Board Daily", author: Shi Zhengcheng. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

NVIDIA launches an embodied intelligence inference model.