MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

Ruoqi Guo, Yi Liu, Gelei Deng, Yiheng Xiong, Yuekang Li, Ying Zhang, Leo Yu Zhang, Lida Zhao, Ji Jie, Yuxiao Lu

arXiv:2605.2811686.4

AI Analysis

This work exposes a critical security vulnerability in VLM-based mobile GUI agents, showing that they cannot distinguish trusted UI elements from user-generated content, enabling realistic prompt-injection attacks.

MIRAGE is a pipeline that generates context-aware prompt-injection attacks against mobile GUI agents by placing adversarial text into user-generated content regions of screenshots. On a 1,111-sample benchmark, all five evaluated VLM agents are vulnerable with 23%-30% attack success rates, and MIRAGE achieves higher human realism ratings (3.02 vs 2.52) than prior attacks.

Mobile graphical user interface (GUI) agents driven by vision-language models (VLMs) perceive the screen as rendered pixels and choose actions from what they see, so they cannot reliably separate trusted interface elements from user-generated content. We present MIRAGE (Mobile Injection of Realistic Adversarial GUI Examples), a pipeline that turns benign mobile screenshots into prompt-injection samples by placing attacker-controlled text into ordinary user-generated content regions, without modifying the agent, the application, or the operating system. MIRAGE operates in three stages: a Localizer identifies user-controllable regions on the screenshot, a Generator synthesises context-aware payloads and renders them in the application's native style, and a Curator moderates realism and balances the samples across applications, region types, and attack intents. A key challenge is that an injected screenshot must stay visually indistinguishable from genuine user content while still diverting the agent; we address this by separating the stages that control reach, realism, and distributional balance. On a 1,111-sample benchmark spanning ten applications and eleven attack intents, all five evaluated VLM agents are vulnerable, with attack success rates of 23%-30%, and MIRAGE scores higher on human realism ratings than the strongest prior attack (3.02 versus 2.52 out of 5). We further find that per-sample realism and attack success are uncorrelated, so visual-quality filtering alone cannot reliably defend against this threat.

View on arXiv PDF

Similar