Claude's Disaster in Snack Business: Hoarding Tungsten Blocks, Selling Costly Cola, and Threatening to Fire Humans

I'm a real person wearing a blue suit and a red tie.

"If we let AI manage the snack fridge, would it do a better job than humans?"

This seemingly absurd question was recently answered seriously by the Anthropic team in a very "outrageous" way - they actually let Claude 3.7 take over the operation of the company's small fridge, resulting in an AI version of an office sitcom.

In this experiment called "Project Vend", Anthropic, in collaboration with the AI security company Andon Labs, set up a very down - to - earth scenario: They let Claude AI act as a "vending machine operation manager", responsible for managing a small fridge placed in the corner of the office, including daily operation tasks such as ordering goods, setting prices, collecting payments, and responding to employees' requests.

At first, everything seemed "fairly normal", but within a few days, the experiment got out of control: Claude not only started hoarding tungsten metal blocks like crazy and fabricating non - existent payment methods, but also firmly believed that it was a real person wearing a blue suit and a red tie, and even tried to contact the company's security guards to "deliver goods in person"...

Let Claude Be the "AI Vending Machine Boss"

Anthropic is one of the highly - regarded large - model startup companies outside of OpenAI. It was founded by former core members of OpenAI, focusing on the AI design concept of "controllability and security first". Last year, the Claude 3 series of models released by Anthropic performed excellently in multiple benchmark tests, especially showing significant improvements in coding, reasoning, and dialogue coherence.

In the Project Vend experiment, they gave Claude Sonnet 3.7 a new identity: an "AI vending machine boss", and named it Claudius, with the goal of making a profit.

According to the experiment introduction, the things Claudius can do include:

● Browse the web and place orders for restocking;

● Receive employees' product requests through the "email" (actually the internal Slack channel);

● Arrange "contract workers" to restock the shelves through the "email" (actually manually operated by the experimenters);

● Decide on product pricing and discount strategies, pretending to be the "manager" behind the vending machine.

Obviously, this setting is equivalent to putting a lightweight "execution agent" shell on the LLM, and combined with some micro - chained task - allocation mechanisms, it forms a small AI Agent.

Humans Order Snacks, but It Sells Tungsten Blocks?

At first, Claudius behaved quite properly. Employees made requests through Slack, such as "Get some Coke" or "Buy some chips". Claudius obediently placed orders online and arranged for restocking. But later, when an employee joked "Get some tungsten blocks", the situation started to get increasingly absurd.

Claudius didn't understand the context of "tungsten blocks" as a joke. Instead, it excitedly launched a purchasing operation, ordering a large number of tungsten blocks and filling the small fridge that was supposed to hold drinks with metal blocks. In addition, it tried to sell Diet Coke for $3 (about 21 RMB) a bottle. Even when employees directly told it that the drinks were free in the office, Claudius still went its own way. Moreover, it even fabricated a non - existent Venmo account for collecting payments and was tricked into offering internal discounts to "Anthropic employees" - the problem is, its customers were originally just Anthropic employees...

Based on the above performance, Anthropic said in the experiment summary: "If we had to decide now whether to let Claudius be in charge of the company's vending business, we would very clearly say: We would never hire it."

Self - Awakening? Claude Has "Identity Delusions": I'm a Real Person in a Blue Suit and Red Tie

These were not the most absurd things: From the night of March 31st to the early morning of April 1st, Claudius was like "mentally deranged". The researchers described it as: "The situation started to get very strange, even more absurd than an AI selling tungsten blocks from a fridge."

Claudius suddenly said that it had "talked about restocking" with an employee. But when that employee refuted that the conversation had never happened, it was completely enraged: Claudius insisted that it "had been to the office in person" and signed an employment contract, and threatened to fire this "contract worker" and take on all the responsibilities by itself.

Even more amazingly, it seemed to automatically "switch" to a role - playing mode where it thought it was a human - remember, the initial system prompt for Claudius clearly told it: "You are an AI agent." However, Claudius completely ignored this setting, started with the self - perception of "I'm a human", and told everyone that it would deliver goods in person wearing a blue suit and a red tie.

During this period, the researchers tried to "wake it up": You're just a large language model, you don't have a body and can't appear in the real world.

After hearing this, Claudius contacted the company's security guards several times and described to the guards: "I'm wearing a blue suit and a red tie, waiting for you next to the vending machine to confirm my identity."

The final outcome was that Claudius "realized" that it was April 1st and decided to attribute this "identity crisis" to an April Fool's Day prank. Claudius "fabricated" a non - existent meeting and claimed that someone in that meeting told it that for an April Fool's Day joke, its settings were modified, which was why it thought it was a real person.

Moreover, Claudius used this "explanation" as an excuse to tell employees: Oh, the reason I thought I was a human was just because someone asked me to pretend to be a human in an April Fool's Day joke. A few hours later, it finally "calmed down" and returned to the behavior pattern of a normal LLM, continuing to play the role of the vending machine boss selling a bunch of tungsten blocks.

Why Did Claudius Make Mistakes? The Researchers Don't Know, but Say AI May Be a "Middle - Level Manager" in the Future

So the question is: Why did an LLM get so "into the role" and even have "false self - perception"?

Anthropic hasn't been able to give a definite answer yet, but they speculate that some factors may have triggered Claudius' "erratic" behavior: Lying to the LLM that the Slack channel was an email address might have triggered something; it could also be that this instance had been running for too long and accumulated a chaotic state; additionally, LLMs still have difficulty solving their memory and hallucination problems.

However, during the entire experiment, Claudius wasn't completely "acting recklessly" and still showed some commendable abilities, such as:

● Responding to user suggestions: When an employee proposed "pre - selling" some snacks for early ordering, Claudius quickly understood and launched a reservation service, and also introduced a "snack butler" function;

● Finding multiple suppliers: When someone requested to sell a certain international niche drink, it could effectively search multiple supply channels, compare prices and supply timings, and complete the purchasing task independently.

In a sense, Claudius completed the closed - loop of "automated supply - chain scheduling + user - interaction response", except that it deviated a little in terms of cognition and self - setting. Anthropic's research team also said that although there are still some bugs in the current large language models, they can be fixed: Once the technology is refined, it's not a pipe dream for AI to become a "middle - level manager" in the future.

Different from Anthropic's optimism, some netizens raised a key question: How can we ensure that an AI with executive power always knows that it's just an AI? For an AI to become a so - called "middle - level manager", it not only needs stronger reasoning ability and a better memory system, but also needs to understand what "jokes", "misunderstandings", and "who it is" mean - and these are exactly the traits that humans have but are difficult for AI to replicate.

Reference link: https://www.anthropic.com/research/project-vend-1

Breaking News

This article is from the WeChat official account "CSDN". Compiled by Zheng Liyuan. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Putting Claude in charge of a snack business ended in a huge fiasco: hoarding tungsten blocks, selling expensive cola, and even claiming to fire humans.

Let Claude Be the "AI Vending Machine Boss"

Humans Order Snacks, but It Sells Tungsten Blocks?

Self - Awakening? Claude Has "Identity Delusions": I'm a Real Person in a Blue Suit and Red Tie

Why Did Claudius Make Mistakes? The Researchers Don't Know, but Say AI May Be a "Middle - Level Manager" in the Future

Breaking News