CRCVJul 1, 2024

Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

arXiv:2407.02534v27 citationsh-index: 4Has Code
AI Analysis

This research addresses security vulnerabilities in VLMs for AI safety and robustness, highlighting a specific gap in logic understanding that could be exploited by malicious attackers.

The paper tackles the problem of logic-based jailbreak vulnerabilities in Visual Language Models (VLMs) by introducing a novel dataset, Flow-JD, for evaluating these capabilities, and finds that jailbreak rates can reach up to 92.8% in models like GPT-4o and GPT-4V.

Large Visual Language Model\textbfs (VLMs) such as GPT-4V have achieved remarkable success in generating comprehensive and nuanced responses. Researchers have proposed various benchmarks for evaluating the capabilities of VLMs. With the integration of visual and text inputs in VLMs, new security issues emerge, as malicious attackers can exploit multiple modalities to achieve their objectives. This has led to increasing attention on the vulnerabilities of VLMs to jailbreak. Most existing research focuses on generating adversarial images or nonsensical image to jailbreak these models. However, no researchers evaluate whether logic understanding capabilities of VLMs in flowchart can influence jailbreak. Therefore, to fill this gap, this paper first introduces a novel dataset Flow-JD specifically designed to evaluate the logic-based flowchart jailbreak capabilities of VLMs. We conduct an extensive evaluation on GPT-4o, GPT-4V, other 5 SOTA open source VLMs and the jailbreak rate is up to 92.8%. Our research reveals significant vulnerabilities in current VLMs concerning image-to-text jailbreak and these findings underscore the the urgency for the development of robust and effective future defenses.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes