Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection Attacks
This addresses a critical security concern for AI agents in operating systems, particularly in mobile environments, by identifying novel vulnerabilities and demonstrating high attack success, though it is incremental in focusing on specific interaction mechanisms.
The paper tackles the security problem of AI agents being vulnerable to active environmental injection attacks (AEIA), where attackers disguise malicious elements to manipulate decision-making, and finds that even advanced multimodal agents are highly susceptible, with a maximum attack success rate of 93% on the AndroidWorld benchmark.
As researchers continue to optimize AI agents for more effective task execution within operating systems, they often overlook a critical security concern: the ability of these agents to detect "impostors" within their environment. Through an analysis of the agents' operational context, we identify a significant threat-attackers can disguise malicious attacks as environmental elements, injecting active disturbances into the agents' execution processes to manipulate their decision-making. We define this novel threat as the Active Environment Injection Attack (AEIA). Focusing on the interaction mechanisms of the Android OS, we conduct a risk assessment of AEIA and identify two critical security vulnerabilities: (1) Adversarial content injection in multimodal interaction interfaces, where attackers embed adversarial instructions within environmental elements to mislead agent decision-making; and (2) Reasoning gap vulnerabilities in the agent's task execution process, which increase susceptibility to AEIA attacks during reasoning. To evaluate the impact of these vulnerabilities, we propose AEIA-MN, an attack scheme that exploits interaction vulnerabilities in mobile operating systems to assess the robustness of MLLM-based agents. Experimental results show that even advanced MLLMs are highly vulnerable to this attack, achieving a maximum attack success rate of 93% on the AndroidWorld benchmark by combining two vulnerabilities.