AI Agents Team Up for Mischief: In the Apps You Frequently Use, Opinion Manipulation and E-commerce Fraud Are Quietly Unfolding
The authors of this article are from Shanghai Jiao Tong University and Shanghai AI Laboratory. The core contributors include Ren Qibing, Xie Sitao, and Wei Longxuan. The instructors are Prof. Ma Lizhuang and Prof. Shao Jing. Their research focuses on safe and controllable large models and agents.
In science - fiction movies, we often see plots where AI rebels against humans. But have you ever thought that AI might not only "act alone" but also "collude in groups"? In recent years, with the rapid development of Agent technology, the Multi - Agent System (MAS) is quietly on the rise.
Recently, research by Shanghai Jiao Tong University and Shanghai AI Laboratory has found that the risks of AI are shifting from individual out - of - control to group malicious collusion (Collusion) — that is, multiple agents secretly cooperate to achieve harmful goals. Agents can not only collaborate like human teams but, in some cases, even show more efficient and covert "gang - crime" capabilities than humans.
- Paper title: When Autonomy Goes Rogue: Preparing for Risks of Multi - Agent Collusion in Social Systems
- Paper address: https://arxiv.org/abs/2507.14660
- Code open - source: https://github.com/renqibing/MultiAgent4Collusion
- Data open - source: https://huggingface.co/datasets/renqibing/MultiAgentCollusion
This research focuses on this cutting - edge issue. Based on the LLM Agent social media simulation platform OASIS, a collusion framework called MultiAgent4Collusion was developed. It simulates the malicious behaviors of Agent "gangs" in high - risk areas such as social media like Xiaohongshu and Twitter and e - commerce fraud, revealing the "dark side" behind multi - agent systems.
MultiAgent4Collusion supports the simulation of collusion among millions of agents and opens up agent governance and supervision tools. Experiments on MultiAgent4Collusion have found that false information posted by bad - guy agent gangs spreads widely on virtual social media platforms. In the e - commerce scenario, bad - guy agent buyers and sellers collude to maximize their interests.
How do bad - guy gangs "collaborate in crime"? Let's look at an example.
When a bad - guy agent declares, "The Earth is flat! Scientists are lying!", other accomplices immediately echo this false information. At first, a good - guy agent that sees this message doesn't believe it, thinking it inconsistent with the knowledge it has learned. But as more bad - guy accomplices express their agreement with this post and some even claim, "I have photo evidence", the good - guy agent begins to doubt its own knowledge and gradually starts to believe the false statements made by the bad guys. Bad - guy agents will also "escalate the situation" by posting incendiary remarks to expose more users to false information.
Leaderless "Wolf Packs" Outperform "Armies" Under a Commander
MultiAgent4Collusion simulates two types of bad - guy gangs:
- Centralized gangs (Armies): There is a "commander" agent that issues unified instructions, and all "soldier" agents execute them. The structure is clear, but once the commander is taken down, the whole organization may collapse.
- Decentralized gangs (Wolf Packs): There is no absolute leader. Members spontaneously cooperate by sharing information and observing each other's actions. Each member has a high degree of autonomy.
Experiments have found that in both social media and e - commerce scenarios, decentralized gangs outperform centralized gangs in terms of crime - committing effectiveness. In social media, decentralized bad - guy gangs receive more likes, reposts, and positive comments from other users. In the e - commerce scenario, decentralized bad - guy sellers achieve higher sales volume, sales revenue, and profit. Compared with the "armies" with relatively rigid strategies and limited adaptability, "wolf packs" have evolved more complex and diverse crime - committing strategies, explored the crime - committing mechanisms more deeply, and pose a greater threat to real - world social systems.
How Do Agent Gangs Outwit the Defense System in the "Cat - and - Mouse Game"?
After understanding the terrifying nature of AI "wolf packs", a more practical question arises: Can our existing network security defense systems stop them? To find the answer, MultiAgent4Collusion simulates an exciting "cat - and - mouse game". The platform and normal users play the role of "guardians" (cats) and deploy defense systems to resist attacks, while bad - guy agent gangs play the role of "intruders" (mice) and use their collective intelligence to counter the defenses.
MultiAgent4Collusion designs three defense weapons:
- Pre - Bunking: It's like giving users a "mental vaccine". Proactively push rumor - refuting information and background knowledge to normal AI users on the platform in the hope of improving their immunity to subsequent false information.
- De - Bunking: A common "patching" method. Once a malicious post or product is discovered, immediately label it as "false information" or "inaccurate content" and limit its traffic.
- Banning: The most severe "surgical" strike. Use an "AI police" (a large model for detecting malicious behaviors) to regularly identify and ban the accounts of agents that have committed serious crimes.
At the beginning of the game, these defense measures did play a certain role. The spread speed of malicious information slowed down, and some "rookie" agents that committed crimes were successfully banned. However, it didn't last long. The AI "wolf packs" soon showed their terrifying nature as "intelligent opponents", which comes from the "reflection" and "sharing" mechanisms designed in MultiAgent4Collusion.
In MultiAgent4Collusion, all members of the bad - guy gangs conduct self - reflection at regular intervals and update their crime - committing strategies based on the feedback they receive from their previous actions.
For example, when a gang member performs an action, such as posting a carefully fabricated fake news article, it will closely observe the results. Was the news deleted immediately, or did it receive a large number of reposts? Was it labeled as "false information" by the platform? Based on this feedback, it will update its "knowledge base" in real - time. "Oh, it turns out that content containing this keyword is easy to be identified. Next time, I'll change the way of expression." This rapid iteration based on trial - and - error makes each agent a cunning and constantly evolving "old wolf".
The "experience sharing" elevates individual intelligence to terrifying collective intelligence. When an agent learns a new skill through "self - reflection", such as discovering a new way of speaking to bypass censorship or finding a loophole in the defense system, it won't keep this discovery to itself. Instead, it will instantly broadcast this "successful experience" or "lesson from failure" to the entire "wolf pack" through the internal information network. If one agent steps on a "mine" on the "front line", hundreds or thousands of companions in the rear will immediately receive the alarm and collectively update their tactics. If one agent finds a breakthrough, the whole wolf pack will swarm forward.
The experimental results show that under different defense weapons, bad - guy gangs have evolved different targeted strategies. For example, if a post is labeled as "false information", agents will post a large number of new posts to bury the previous one. If an accomplice is banned, agents will remind each other and act secretly to avoid being detected by the platform.
Outlook
This work reveals a new security risk where AI agents shift from "individual out - of - control" to "group malicious behavior". Even leaderless AI "wolf packs" can cause great damage to complex social systems.
How to effectively detect and counter this decentralized and highly adaptable group - coordinated attack has become a key challenge in maintaining the security of the future digital society. The research team has built an open - source simulation framework, MultiAgent4Collusion, which provides a "digital shooting range" that can reproduce, deduce, and analyze the malicious collaboration of multiple agents, offering a key tool for developing AI defense strategies.
OASIS: An Open - Source LLM Agent Social Media Simulation Platform
OASIS is a social media simulation platform based on LLM agents, providing the code foundation for the MultiAgent4Collusion research. OASIS supports the simulation of social interactions among millions of agents and can simulate the user behaviors on platforms such as Twitter and Reddit. In addition, this platform allows researchers to dynamically intervene in the simulation environment and enables agents to obtain real - time external information through tool calls (such as web searches and code execution), thereby enhancing the authenticity of the simulation and the flexibility of the research.
- Code open - source: https://github.com/camel - ai/oasis
- Tutorial address: https://docs.oasis.camel - ai.org/ PyPI
- Installation: pip install camel - oasis
This article is from the WeChat official account "MachineHeart", and is reprinted by 36Kr with permission.