CLCRLGAug 7, 2024

Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection

arXiv:2408.03554v121 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses a security problem for users of large vision-language models by exposing a novel attack vector, though it is incremental as it builds on known prompt injection concepts.

The paper tackles the vulnerability of large vision-language models to visual prompt injection attacks, specifically goal hijacking, and finds that GPT-4V has a 15.8% attack success rate, indicating a significant security risk.

We explore visual prompt injection (VPI) that maliciously exploits the ability of large vision-language models (LVLMs) to follow instructions drawn onto the input image. We propose a new VPI method, "goal hijacking via visual prompt injection" (GHVPI), that swaps the execution task of LVLMs from an original task to an alternative task designated by an attacker. The quantitative analysis indicates that GPT-4V is vulnerable to the GHVPI and demonstrates a notable attack success rate of 15.8%, which is an unignorable security risk. Our analysis also shows that successful GHVPI requires high character recognition capability and instruction-following ability in LVLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes