CVJan 29

LAMP: Learning Universal Adversarial Perturbations for Multi-Image Tasks via Pre-trained Models

Alvi Md Ishmam, Najibul Haque Sarker, Zaber Ibn Abdul Hakim, Chris Thomas

arXiv:2601.21220v11.5h-index: 5

Originality Highly original

AI Analysis

This addresses a security problem for users of multi-image MLLMs by exposing unexplored vulnerabilities in a practical black-box setting, representing a novel method for a known bottleneck.

The paper tackles the vulnerability of multi-image multimodal large language models (MLLMs) by introducing LAMP, a black-box method for learning universal adversarial perturbations (UAPs) that disrupt information aggregation across images, achieving the highest attack success rates across multiple tasks and models.

Multimodal Large Language Models (MLLMs) have achieved remarkable performance across vision-language tasks. Recent advancements allow these models to process multiple images as inputs. However, the vulnerabilities of multi-image MLLMs remain unexplored. Existing adversarial attacks focus on single-image settings and often assume a white-box threat model, which is impractical in many real-world scenarios. This paper introduces LAMP, a black-box method for learning Universal Adversarial Perturbations (UAPs) targeting multi-image MLLMs. LAMP applies an attention-based constraint that prevents the model from effectively aggregating information across images. LAMP also introduces a novel cross-image contagious constraint that forces perturbed tokens to influence clean tokens, spreading adversarial effects without requiring all inputs to be modified. Additionally, an index-attention suppression loss enables a robust position-invariant attack. Experimental results show that LAMP outperforms SOTA baselines and achieves the highest attack success rates across multiple vision-language tasks and models.

View on arXiv PDF

Similar