Adversarial attacks against Modern Vision-Language Models
This work addresses security vulnerabilities for developers deploying VLM agents in commercial settings, though it is incremental as it applies existing attacks to new models.
The paper tackled the problem of adversarial robustness in open-source vision-language models (VLMs) by evaluating two agents, LLaVA-v1.5-7B and Qwen2.5-VL-7B, under gradient-based attacks in a simulated e-commerce environment, finding that LLaVA had high attack success rates (e.g., 66.9% for a CLIP-based attack) while Qwen2.5-VL was significantly more robust (e.g., 15.5% for the same attack).
We study adversarial robustness of open-source vision-language model (VLM) agents deployed in a self-contained e-commerce environment built to simulate realistic pre-deployment conditions. We evaluate two agents, LLaVA-v1.5-7B and Qwen2.5-VL-7B, under three gradient-based attacks: the Basic Iterative Method (BIM), Projected Gradient Descent (PGD), and a CLIP-based spectral attack. Against LLaVA, all three attacks achieve substantial attack success rates (52.6%, 53.8%, and 66.9% respectively), demonstrating that simple gradient-based methods pose a practical threat to open-source VLM agents. Qwen2.5-VL proves significantly more robust across all attacks (6.5%, 7.7%, and 15.5%), suggesting meaningful architectural differences in adversarial resilience between open-source VLM families. These findings have direct implications for the security evaluation of VLM agents prior to commercial deployment.