CVAug 19, 2025

Enhancing Targeted Adversarial Attacks on Large Vision-Language Models via Intermediate Projector

arXiv:2508.13739v22 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses safety concerns for deployed vision-language models by improving targeted adversarial attacks, representing an incremental advance over existing methods.

The paper tackles the problem of targeted adversarial attacks on large vision-language models by proposing a novel black-box attack framework that leverages the intermediate projector (Q-Former) to enhance attack effectiveness and granularity. Results show that their method significantly outperforms baselines in global targeted attacks and achieves superior success rates with better content preservation in fine-grained attacks, with effective transfer to commercial models like Google Gemini and OpenAI GPT.

The growing deployment of Large Vision-Language Models (VLMs) raises safety concerns, as adversaries may exploit model vulnerabilities to induce harmful outputs, with targeted black-box adversarial attacks posing a particularly severe threat. However, existing methods primarily maximize encoder-level global similarity, which lacks the granularity for stealthy and practical fine-grained attacks, where only specific target should be altered (e.g., modifying a car while preserving its background). Moreover, they largely neglect the projector, a key semantic bridge in VLMs for multimodal alignment. To address these limitations, we propose a novel black-box targeted attack framework that leverages the projector. Specifically, we utilize the widely adopted Querying Transformer (Q-Former) which transforms global image embeddings into fine-grained query outputs, to enhance attack effectiveness and granularity. For standard global targeted attack scenarios, we propose the Intermediate Projector Guided Attack (IPGA), which aligns Q-Former fine-grained query outputs with the target to enhance attack strength and exploits the intermediate pretrained Q-Former that is not fine-tuned for any specific Large Language Model (LLM) to improve attack transferability. For fine-grained attack scenarios, we augment IPGA with the Residual Query Alignment (RQA) module, which preserves unrelated content by constraining non-target query outputs to enhance attack granularity. Extensive experiments demonstrate that IPGA significantly outperforms baselines in global targeted attacks, and IPGA with RQA (IPGA-R) attains superior success rates and unrelated content preservation over baselines in fine-grained attacks. Our method also transfers effectively to commercial VLMs such as Google Gemini and OpenAI GPT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes