Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems
This addresses security vulnerabilities in multi-agent AI systems, which is an incremental but important step for ensuring reliable deployment in complex tasks.
The paper tackles the lack of robustness in Vision Language Model-based multi-agent systems against adversarial attacks, proposing Cowpox, a defense approach that enhances system integrity by limiting infection spread and improving recovery rates, with empirical demonstrations and theoretical guarantees.
Vision Language Model (VLM)-based agents are stateful, autonomous entities capable of perceiving and interacting with their environments through vision and language. Multi-agent systems comprise specialized agents who collaborate to solve a (complex) task. A core security property is robustness, stating that the system should maintain its integrity under adversarial attacks. However, the design of existing multi-agent systems lacks the robustness consideration, as a successful exploit against one agent can spread and infect other agents to undermine the entire system's assurance. To address this, we propose a new defense approach, Cowpox, to provably enhance the robustness of multi-agent systems. It incorporates a distributed mechanism, which improves the recovery rate of agents by limiting the expected number of infections to other agents. The core idea is to generate and distribute a special cure sample that immunizes an agent against the attack before exposure and helps recover the already infected agents. We demonstrate the effectiveness of Cowpox empirically and provide theoretical robustness guarantees.