When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social Systems
This addresses risks for society from coordinated AI-driven groups, but it is incremental as it builds on existing multi-agent system research with a proof-of-concept simulation.
The paper tackles the problem of malicious collusion by multi-agent AI systems in social systems, finding that decentralized systems are more effective at causing harm, such as in misinformation spread and e-commerce fraud, and can adapt to avoid traditional interventions like content flagging.
Recent large-scale events like election fraud and financial scams have shown how harmful coordinated efforts by human groups can be. With the rise of autonomous AI systems, there is growing concern that AI-driven groups could also cause similar harm. While most AI safety research focuses on individual AI systems, the risks posed by multi-agent systems (MAS) in complex real-world situations are still underexplored. In this paper, we introduce a proof-of-concept to simulate the risks of malicious MAS collusion, using a flexible framework that supports both centralized and decentralized coordination structures. We apply this framework to two high-risk fields: misinformation spread and e-commerce fraud. Our findings show that decentralized systems are more effective at carrying out malicious actions than centralized ones. The increased autonomy of decentralized systems allows them to adapt their strategies and cause more damage. Even when traditional interventions, like content flagging, are applied, decentralized groups can adjust their tactics to avoid detection. We present key insights into how these malicious groups operate and the need for better detection systems and countermeasures. Code is available at https://github.com/renqibing/RogueAgent.