CV CR LGFeb 1, 2024

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

Maan Qraitem, Nazia Tasnim, Piotr Teterwak, Kate Saenko, Bryan A. Plummer

arXiv:2402.00626v322.135 citationsh-index: 25Has Code

Originality Incremental advance

AI Analysis

This work addresses vulnerabilities in LVLMs that could amplify misinformation in personal assistant applications, representing an incremental improvement over previous simple attack strategies.

The paper tackles the problem of typographic attacks deceiving vision-language models (LVLMs) by introducing self-generated attacks that exploit the models' language reasoning, resulting in up to a 60% reduction in classification performance across models like InstructBLIP and MiniGPT4.

Typographic attacks, adding misleading text to images, can deceive vision-language models (LVLMs). The susceptibility of recent large LVLMs like GPT4-V to such attacks is understudied, raising concerns about amplified misinformation in personal assistant applications. Previous attacks use simple strategies, such as random misleading words, which don't fully exploit LVLMs' language reasoning abilities. We introduce an experimental setup for testing typographic attacks on LVLMs and propose two novel self-generated attacks: (1) Class-based attacks, where the model identifies a similar class to deceive itself, and (2) Reasoned attacks, where an advanced LVLM suggests an attack combining a deceiving class and description. Our experiments show these attacks significantly reduce classification performance by up to 60\% and are effective across different models, including InstructBLIP and MiniGPT4. Code: https://github.com/mqraitem/Self-Gen-Typo-Attack

View on arXiv PDF Code

Similar