The Sims in the World Model Version: AI virtual characters give street speeches to canvass votes, and GPT - 4o wins the election.
A real-world simulator.
After the world model has highly evolved, what are the "people" inside doing?
Some people give street speeches and attract quite a few listeners. Kids play with robotic dogs:
Some people commit crimes in public, and the police go to arrest them. Others propose marriage in public:
This Friday, researchers from the University of Massachusetts Amherst, Johns Hopkins University, and Carnegie Mellon University proposed a fascinating study: Virtual Community.
The Virtual Community combines real-world geospatial data with generative models to create an interactive, scalable open-world scenario with social roots for various types of agents.
Paper: Virtual Community: An Open World for Humans, Robots, and Society
Paper link: https://virtual-community-ai.github.io/paper.pdf
Project link: https://virtual-community-ai.github.io/
This work was submitted last night and immediately attracted the attention of some big names in the AI circle. Xiesaining, an assistant professor at New York University, said that this is of great significance for agent research.
The Virtual Community provides a unified framework for simulating the rich social and physical interactions between humans and robots in a community. It is built on a general physics engine and is based on real-world 3D scenes. The authors implemented a virtual character simulation framework for human agents, while the robot simulation mainly inherits from Genesis.
The Virtual Community supports the generation of agent communities based on 3D scenes by populating the environment with agents (powered by LLMs) configured with robots, human character profiles, and social relationship networks.
Each of these characters has detailed background information and activity schedules, and they will act according to these settings. Their social relationships are constructed in the form of groups, with each group containing a set of agents, a text description, and a designated group activity venue. Therefore, these characters are connected into a cohesive community.
The Virtual Community generates scenes and corresponding agents based on real-world geospatial data. As shown in the figure below: The scene generation component (A) uses generative models to enhance textures and refine rough 3D data, while refining geospatial data to simplify geometric structures. It also uses generative methods to create interactive objects and detailed indoor scenes. The agent generation component (B) uses LLMs to generate agent characters and social relationship networks based on scene descriptions. (C) Then, based on the Genesis engine, it simulates virtual character communities and robots in an open-world scenario.
Interestingly, it can simulate 3D scenes anywhere in the world, constructing a large-scale community for agents - from New York to London, Amsterdam, Denver, and so on.
Existing 3D geospatial data APIs provide rich data in terms of quantity and diversity, but they usually contain a lot of noise and lack details in textures and geometric shapes. To bridge this gap, the authors proposed an online process to comprehensively clean and enhance geometry and textures. This process consists of four steps: mesh simplification, texture refinement, object placement, and automatic annotation.
The authors used this process to generate annotated scenes of 35 different cities around the world:
The Virtual Community also has a functioning transportation system, including pedestrian movement, vehicle flow, and public transportation operations. The authors developed an automated dynamic traffic generation mechanism based on OSM data, which can quickly reconstruct urban road networks and achieve autonomous traffic simulation globally.
As a platform to help future humans and machines collaborate in training, robots will become an indispensable part of the Virtual Community. They are everywhere and will interact seamlessly with the "humans" in it. Currently, the imported robots include Unitree's humanoid robots, Boston Dynamics' robotic dogs, quadcopter drones, Google robots, etc.
Taking advantage of the new features offered by the Virtual Community, the authors introduced two new embodied multi-agent tasks: a campaign task involving multiple human agents and a community assistant task involving both robots and human agents. To successfully complete these tasks, agents need to have the ability to plan in a community environment and the social intelligence to interact with other agents.
As the basis for these two tasks, if not assigned a specific task, agents in the community will follow their default daily plans and routines. In each round of the game, multiple agents are selected and assigned a task. When an agent is given a task, it will pause its daily plan and focus on completing the social task assigned in the community.
In the "campaign" task, candidate agents must efficiently plan to connect with and persuade voter agents in the community. Since voters have different personalities and social relationships, some voters may initially lean towards certain candidates. This requires each candidate to develop adaptive strategies to influence and change voters' opinions throughout the election process.
The results are shown in the figure below. Candidates using the GPT - 4o backbone have a higher average vote - getting rate and conversion rate than those using the GPT - 3.5 - turbo backbone, which means it is more capable of changing voters' views in most scenarios.
The scenario of the community assistant task is that two heterogeneous robots cooperate to assist humans in an open - world environment. These tasks require agents to engage in cooperative planning to assist human avatars in daily activities - carrying, where agents accompany people out and help carry items; and delivery, where agents transport items from a source location (indoor or outdoor) to a destination.
The experimental results show that both baseline methods perform better in delivery than in carrying, which reflects the extremely high difficulty of simultaneously manipulating objects and following humans in a dynamic open world.
The authors hope that the Virtual Community work can help people conduct large - scale future research on social intelligence, including: 1) how robots can cooperate or compete intelligently; 2) how humans develop social relationships and build communities; 3) how intelligent robots and humans can coexist in an open world.
The following are the team members of this research:
For more detailed content, please refer to the original paper.
This article is from the WeChat official account "Almost Human" (ID: almosthuman2014). Authors: Zenan, Yang Wen. Republished by 36Kr with authorization.