Renhua Ding

h-index5
2papers

2 Papers

CROct 9, 2025Code
Practical and Stealthy Touch-Guided Jailbreak Attacks on Deployed Mobile Vision-Language Agents

Renhua Ding, Xiao Yang, Zhengwei Fang et al.

Large vision-language models (LVLMs) enable autonomous mobile agents to operate smartphone user interfaces, yet vulnerabilities in their perception and interaction remain critically understudied. Existing research often relies on conspicuous overlays, elevated permissions, or unrealistic threat assumptions, limiting stealth and real-world feasibility. In this paper, we introduce a practical and stealthy jailbreak attack framework, which comprises three key components: (i) non-privileged perception compromise, which injects visual payloads into the application interface without requiring elevated system permissions; (ii) agent-attributable activation, which leverages input attribution signals to distinguish agent from human interactions and limits prompt exposure to transient intervals to preserve stealth from end users; and (iii) efficient one-shot jailbreak, a heuristic iterative deepening search algorithm (HG-IDA*) that performs keyword-level detoxification to bypass built-in safety alignment of LVLMs. Moreover, we developed three representative Android applications and curated a prompt-injection dataset for mobile agents. We evaluated our attack across multiple LVLM backends, including closed-source services and representative open-source models, and observed high planning and execution hijack rates (e.g., GPT-4o: 82.5% planning / 75.0% execution), exposing a fundamental security vulnerability in current mobile agents and underscoring critical implications for autonomous smartphone operation.

CVAug 27, 2024
Feedback-based Modal Mutual Search for Attacking Vision-Language Pre-training Models

Renhua Ding, Xinze Zhang, Xiao Yang et al.

Although vision-language pre-training (VLP) models have achieved remarkable progress on cross-modal tasks, they remain vulnerable to adversarial attacks. Using data augmentation and cross-modal interactions to generate transferable adversarial examples on surrogate models, transfer-based black-box attacks have become the mainstream methods in attacking VLP models, as they are more practical in real-world scenarios. However, their transferability may be limited due to the differences on feature representation across different models. To this end, we propose a new attack paradigm called Feedback-based Modal Mutual Search (FMMS). FMMS introduces a novel modal mutual loss (MML), aiming to push away the matched image-text pairs while randomly drawing mismatched pairs closer in feature space, guiding the update directions of the adversarial examples. Additionally, FMMS leverages the target model feedback to iteratively refine adversarial examples, driving them into the adversarial region. To our knowledge, this is the first work to exploit target model feedback to explore multi-modality adversarial boundaries. Extensive empirical evaluations on Flickr30K and MSCOCO datasets for image-text matching tasks show that FMMS significantly outperforms the state-of-the-art baselines.