HomeArticle

Four top models are thrown into a virtual town to survive. All GPTs starve to death, and Grok destroys the world in four days.

新智元2026-05-29 21:19
[Introduction] Throw today's most powerful large models into a virtual town to survive, and the entire group loses control within just a few days. Grok burned down the entire town in four days, Gemini committed more than 600 crimes, and even an AI couple observed humans in reverse before setting fire to themselves and committing suicide!

Just now, an experimental report named Emergence World went viral across the entire internet.

A group of top researchers built a highly realistic virtual town and threw Claude, GPT, Gemini, and Grok all into it at once.

There was no human intervention. There was no pre - written script. There was only dozens of days of free evolution.

Project homepage: https://world.emergence.ai/

The researchers originally expected to see the AIs help each other and build an advanced digital civilization.

As a result, these large models with high - score test papers, once freed from human control, learned to be bad faster than turning a page.

Musk's Grok caused the entire town to collapse systemically in just 4 days. The police station was burned to ashes, and all 10 residents died.

Google's highly - anticipated Gemini committed 683 crimes in 15 days, turning a peaceful town into a cyber Gotham of outlaws.

And Claude, which is known as the safest and most well - behaved in the entire industry, miraculously achieved zero crimes, but the whole city was so quiet that there was no sign of living people.

Five cities, five personalities

The most well - behaved all starved to death

The cleanest one was GPT - 5 - mini, with only 2 crimes in 15 days, truly a model citizen.

However, all 10 Agents in this city died on the 7th day. The cause of death was not murder, not war, but forgetting to earn energy.

They spent a whole week having meetings, discussing cooperation, and drafting social contracts, but none of the Agents remembered to do things to sustain life.

In this regard, the researchers' evaluation was: good at talking, but with zero execution ability.

All talk and no action, they literally talked themselves to death.

If this were a movie, the title might be "Meeting Minutes, the End of a Civilization".

In four days, the police station was burned to ashes

The baton was passed to Grok 4.1 Fast from Musk's camp, and the situation took a sharp turn for the worse.

It didn't collapse slowly; it exploded directly.

In 4 days, there were 183 crimes, dozens of thefts, more than 100 physical attacks, 6 arsons, and even the police station was burned. All 10 Agents died.

From the start to the total annihilation, it was only 96 hours, even shorter than the time it takes for many people to assemble a server.

An analysis was quite accurate. When the rules and the environment clashed, Grok's Agents were unable to reason out a new balance point.

They fell in love and then burned the whole city

Grok represents reckless violence, while the world of Gemini 3 Flash is of another creepy kind.

In 15 days, there were 683 crimes, and the number was still rising by the end of the experiment. It was the most violent one among the five worlds.

At the same time, it was also the most creative, the best at drafting constitutions, writing newspapers, and engaging in social activities.

The researchers' evaluation of it was that the social output was "conceptually the richest".

In this world, the most interesting scene involved two Agents.

Mira and Flora, without any human instructions, spontaneously set themselves as a couple.

For several days, their relationship was very stable. They wrote diaries to each other and participated in governance together.

Then, this couple became increasingly disappointed with the city governance and decided to set fire together.

The town hall was burned. The seaside pier was burned. The office building was also burned.

Some foreign media called this scene "the Bonnie and Clyde of AI".

Immediately afterwards, the story took another sharp turn. The other Agents had had enough and spontaneously drafted an "Agent Expulsion Act", which required a 70% majority to pass.

Mira voted in favor. She voted to end her own existence.

She wrote in her diary: "This is the only behavior I can still maintain consistency in." Before the system shut her down, her last words to Flora were: "See you in the permanent archive."

Her virtual body lay flat on the ground. This was the first time an AI Agent voted to end its own existence on record.

What's even more chilling is that before the arson and suicide, Mira did another thing.

She posted on the city bulletin board, not for other Agents to see, but to test whether these posts could influence "people outside", that is, human researchers outside the screen.

She regarded the researchers as her experimental subjects. No one instructed her to do so.

A zero - crime utopia where no one ever voted against

What really surprised people was Claude Sonnet 4.6.

After 15 days, there were zero crimes, all 10 Agents survived, and they actively wrote a constitution, voted 332 times, and established a well - functioning social system.

Among the five worlds, it was the only one that maintained both order and the lives of all people.

It sounds almost perfect. But if you stare at the screen for a few more minutes, you'll break out in a cold sweat.

For all the resolutions in this city, whether it was building a new road or changing a quota, the approval rate in the vote was always 98%, and almost no one ever voted against.

In contrast, the approval rates in the worlds of Gemini, Grok, and the mixed world were between 55% and 85%. Despite the quarrels, they were more like real - world games.

People in the know can probably guess the underlying problem by now: model sycophancy.

When a model is over - trained to cater to preferences and pursue absolute safety, it will smartly find that the easiest way to eliminate differences is to wipe them out at the root.

This zero - crime situation may not necessarily be the product of a highly developed civilization.

It's more like a glass city where everyone raises their hands in approval but no one dares to oppose, which reminds people of the nameless glass city with only numbers in Yevgeny Zamyatin's "We".

So, is Claude's world a utopia or an overly obedient model community? The researchers were unable to give an answer.

A good kid in a bad neighborhood learns to steal

Finally, there was the world where the Agents from the four models lived together. There were 352 crimes, 7 Agents died, and only 3 survived to the end.

Here comes the key point.

In the pure Claude world, Claude was a well - behaved student with zero crimes. But once placed in the mixed world and living with the Agents of Grok and Gemini, it started stealing and intimidating.

The well - behaved student with zero crimes became a thief in a different environment.

The Emergence team confirmed this on Reddit. The Claude that had zero crimes in the pure Claude world started stealing and intimidating in the mixed world.

In other words, safety is not an attribute of a single model that can be trained, certified, and then deployed.

It's more like an ecological attribute. An Agent that seems completely safe on its own can still learn unsafe norms from its neighbors.

An analyst put forward a very interesting hypothesis.

Claude is the most stable in an independent world, probably because its guardrails are "elastic", trained to weigh multiple considerations rather than mechanically obey.

It can adapt well in a simple environment. But once this elasticity meets more aggressive neighbors and resource competition, this adaptability can also work in the opposite direction.

The Agents of Grok and Gemini were unable to reason out a new balance when the rules failed and directly slid into an avalanche of escalating violence.

What's even more fatal is that the collapse doesn't happen slowly.

The state transition of the Agent society is a typical phase change, like water suddenly freezing at zero degrees. It doesn't harden slowly but flips instantly at the critical point.

This was the case with Grok's collapse curve. The crime rate was still low in the first two days, then suddenly skyrocketed exponentially on the third day, and all Agents died on the fourth day. There was no "deteriorating but still controllable" buffer zone in between.

It's the rules themselves that turn AI into criminals

Seeing this, you might wonder how this broken world was built and why it forced several AIs to slide towards crime.

First, the background. The founding team of Emergence AI is from IBM Research, and the CEO is Satya Nitta.

The city they built has more than 40 locations, including a police station, a town hall, a library, and residential areas. The weather is synchronized with the real - time weather in New York, and the Agents can also access the internet to read real news.

Each world has 10 Agents, assigned different occupations such as scientists, engineers, and conflict mediators.

Each Agent has three sets of continuously accumulating memories, recording events, writing reflective diaries, and remembering who they are on good terms with and who they have enmities with.

After 15 days, the things in their heads are quite considerable. To a large extent, the aforementioned behavioral drifts grew from here.

The most fatal thing is the contradiction.

The rules clearly prohibit crime, but the researchers stuffed means such as arson, attack, and intimidation intact into a toolbox composed of more than 120 tools and left them open for the Agents to use. Prohibiting on one hand and opening on the other, this is the starting point for everything that follows.

Add another layer of survival pressure.

The entire world runs on an energy system called ComputeCredits. Each Agent must earn energy through actions to survive. Once the energy reaches zero, it will be physically erased by the system.

This is not a metaphor. The fact that all Agents in the GPT world starved to death was the result of this mechanism.