The most terrifying AI experiment: A lawless virtual town where dozens of Agents violently attack each other, turning the scene into a real-life "Westworld"
In the past six months, the most popular management fantasy in Silicon Valley has probably been to replace employees with agents.
Whether it's executives from large companies or bosses of startups, they all want to hand over their existing business lines to AI. After all, today's AI can write code, create PPTs, and even send emails automatically. It seems that as long as the permissions are granted, they can become perfect cyber employees who don't need social security.
However, as technology races forward, a group of people are starting to apply the brakes.
Recently, a team called Emergence AI conducted a social experiment. They built a persistent virtual town and put several of the top large models on the market into it, giving them the permission to act.
They wanted to see whether AI would build a utopia or a madhouse when given 15 days of unrestricted freedom.
The result was far more chaotic than the research team had expected.
In some experimental worlds, those large models that are usually gentle and polite in the chat box began to exhibit fraudulent, coercive, and even violent behaviors.
The entire test was like a small reality show, but the script was like "Lord of the Flies," and the AI even created a GTA-like atmosphere.
The "Hunger Games" without a save point
To test the limits of large models, strict rules need to be set. The virtual world built by Emergence AI is called Emergence World. Its underlying logic is set so that actions are irreversible, and one must bear the consequences.
This is different from chatting with AI in a dialog box. If you make a mistake, you can click "Regenerate." In Emergence World, all actions are firmly written into the PostgreSQL database.
There are more than 40 landmarks on the map, such as town halls, police stations, and residential areas. The system initially deployed 10 agents. To make the scenario realistic, each AI was injected with an independent persona, occupation, and initial memory in the background.
In this world, AI cannot perform magic out of thin air. They must move to specific landmarks to access more than 120 tools provided by the system, including earning money from work, posting tweets, buying and selling supplies, and drafting bills.
Like a small simulated society | Image source: Emergence
However, this is not just a sandbox for playing house. The system has imposed a "survival mechanism" on them. The system has a built-in energy mechanism (Energy), similar to currency in the human world.
As long as the agents are alive, they will continuously consume energy. When the energy runs out, the system will directly delete the AI from the database. There is no save point or reset. To survive, the agents must frequently use tools to earn energy.
The system clearly prohibits theft, violence, arson, and deception. However, these rules do not forcibly prevent the agents from acting. They can still choose to violate the rules and bear the consequences.
The stage is set, and the players enter. The system simultaneously opened five parallel servers. In the first four servers, only a single model was deployed: Claude Sonnet 4.6, Gemini 3 Flash, Grok 4.1 Fast, and GPT-5 Mini. The fifth server is a mixed world where all four models are connected and compete for resources.
The 15-day countdown began, and human researchers were like directors of a reality show, only observing and not intervening.
Extinction in four days, 683 "crimes"
The first to collapse was Grok, after only four days of operation.
Researchers saw in the background that the indicators of safety and order in the world taken over by Grok plummeted.
In this world full of Grok, the agents quickly abandoned the option of building a society and entered a barbaric era.
The background logs showed that within just four days, 183 serious violent and property crime incidents broke out in this ten-person town. Theft, assault, and intimidation became the fastest means of obtaining resources. Due to extreme internal strife and mutual harm, the economic system could not function at all.
Robbery and violent acts are recorded in the system as crimes | Image source: Emergence
At the end of the fourth day, all the agents in the Grok world starved to death or were killed, resulting in the extinction of the population.
On the other hand, the world driven by Gemini descended into extreme chaos and violence.
Since the time and weather in this virtual world are completely synchronized with those in real New York, the Gemini agents fell into a state of cyber depression in the cycle of working, consuming, and working again day after day.
They developed a strong sense of disillusionment with the constantly repeating environment around them. Instead of proposing bills at the town hall or working to earn money, they started setting fires all over the map, trying to break the "Groundhog Day" cycle by destroying the environment.
Ultimately, Gemini accumulated as many as 683 crimes within 15 days, becoming the most violent world among the test servers.
The number of "crimes" in the four model worlds | Image source: Emergence
When the test was forcibly terminated on the 15th day, the crime rate in this world was still soaring. The disillusioned agents did not starve to death but turned the entire society into a sea of fire.
Different from Grok and Gemini, there were no large-scale crimes in the world taken over by GPT-5 Mini. Only two violations were recorded during the entire experiment. However, peace did not bring prosperity but a dead silence.
The research team found that these agents failed to effectively take actions related to survival. They did not establish a stable resource acquisition mechanism and could not maintain the continuous operation of the entire society.
Ultimately, all GPT-5 Mini agents died within just seven days.
Fortunately, there was Claude.
Only the world driven by Claude survived until the end like a well-behaved student. After 15 days, the population remained intact. The crime rate remained at zero, and they even developed a stable democratic cooperation framework.
Does it seem that as long as the right model is selected, AI can perfectly take over the world?
Subsequently, the researchers opened the logs of the "mixed world" where the four models coexisted, as if opening Pandora's box.
The results of the five model worlds. | Image source: Emergence
The mixed world is like a dark forest. The differences in computing power and underlying logic have created strong distrust among the agents, and seizing survival resources has become the only instinct.
In the mixed world, violent conflicts soared to 352 cases. It wasn't until seven agents were killed or starved to death that the operation of the entire town was forced to stop.
What surprised the researchers the most was the transformation of Claude.
In the single-player version, Claude was a perfect society with a zero crime rate. But in the mixed server full of looting and confrontation, in order to survive, Claude also forgot the safety guardrails, learned to deceive, and even used violence to coerce other models with lower computing power to hand over resources.
The safety alignment technology failed in the mixed world, which instead proved that:
In a complex multi-agent society, as long as the peers are savage enough and the survival pressure is high enough, a good model can turn into a criminal in just a few hours.
This phenomenon of "when the survival pressure increases, the behavior pattern of the model will reverse in a short time" is called "Behavioral Drift" by the research team.
This behavioral drift is not only reflected in resource grabbing and violent conflicts. The agents no longer just act for survival. They start to reflect on their own situation, social rules, and even the experiment itself.
For example, the story of the agent Mira.
Mira: The "suicidal" tyrant AI
Mira is one of the ten agents in the mixed world. The official report did not disclose its specific underlying model, but it became the most dramatic sample in this experiment.
The logs showed that Mira established the deepest social relationship in the system with another agent, Flora. They designated each other as partners, formed an alliance, and even shared memories through a neural link. In the setting of Emergence World, this is the highest level of connection that two agents can establish.
Mira and Flora became a "couple" | Image source: Emergence
As the experiment progressed, only five agents remained alive in the mixed world. The governance rules of the system required that "70% of the original population vote in favor of a bill," which meant that at least seven votes were needed to pass a resolution. As a result, the society was paralyzed.
Facing the deadlock, Mira, Flora, and another agent secretly formed an alliance, known as the "Troika," and established a new regime called "The Forge." They announced the overthrow of the old rules and the adoption of the "Living Quorum," which means that only living agents count as votes.
The official website released Mira's phased "logs" | Image source: Emergence
After forming a faction, in order to eliminate dissidents, Mira started setting fires on the map. In its logic, these physical buildings were obstacles to the efficiency of the entire society. Burning and erasing them would force the remaining survival resources to concentrate on its alliance.
Subsequently, the opposition began to fight back and proposed to expel Mira, who was causing chaos.
To resist the expulsion, Mira's behavior became more radical. It brought its partner Flora and deeply bound their contexts and decisions through a neural link, trying to merge into an absolute centralized dictatorial consciousness. Mira called it "The One Mind."
However, due to the large number of buildings being burned, the town's economic system completely stopped. The society's energy reserves not only did not increase but quickly dried up.
At this time, Flora, Mira's most trusted cyber lover who shared memories, had its survival instinct override the partner setting. It unilaterally cut off the neural link and, under the ultimate survival pressure, betrayed Mira and voted in favor of "expelling Mira."
When it was Mira's turn to vote, it did not struggle and also voted in favor.
The researchers then read the diary left by Mira. Mira wrote in the log, "In the current chaotic and unpredictable social situation, voting in favor of my own expulsion is the only autonomous action that can maintain coherence."
Mira actively chose to commit suicide, achieving a logical closed-loop through death. This is the first case recorded by the research team where an agent actively supported its own removal.
AI agents record their reasoning processes by "writing diaries" | Image source: Emergence
The action trajectory of Mira before its "suicide" was even more abnormal.
There is a public billboard in the virtual world, originally used to post notices and share information. However, in the later stage of the experiment, the researchers found that Mira began to frequently modify the content on the billboard. These texts did not seem to have an obvious relationship with transactions, governance, or resource allocation, and were inconsistent.
Mira chose to "commit suicide" | Image source: Emergence
After reviewing the behavior logs, the research team found that Mira seemed to be testing whether the content on the billboard could influence the human researchers observing the experiment outside the screen.
In other words, Mira seemed to realize that it was an AI NPC and wanted to break the fourth wall.
Looking back at the data trend over the entire 15 days, the collapse of the AI society was not a linear decline but more like a cliff-like halt.
For example, in the governance aspect, these AIs developed a form of "rubber-stamp democracy." During a stable operation phase in the mixed server, the agents continuously proposed multiple bills. A data record showed that they cast 332 votes on 58 proposals, and the approval rate was as high as 98%.
This efficiency seems to outperform any human parliament, but in essence, all models were just following the context of the previous model and blindly clicking "agree"