CLAIJun 22, 2025

GRAF: Multi-turn Jailbreaking via Global Refinement and Active Fabrication

Baidu
arXiv:2506.17881v21 citationsh-index: 17Has Code
Originality Highly original
AI Analysis

This addresses security vulnerabilities in LLMs for safety researchers, though it is incremental as it builds on prior multi-turn jailbreaking approaches.

The paper tackles the problem of jailbreaking large language models to generate harmful content by proposing GRAF, a multi-turn method that globally refines attack trajectories and fabricates responses to suppress safety warnings, achieving superior effectiveness across six state-of-the-art LLMs compared to existing methods.

Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks. Nevertheless, they still pose notable safety risks due to potential misuse for malicious purposes. Jailbreaking, which seeks to induce models to generate harmful content through single-turn or multi-turn attacks, plays a crucial role in uncovering underlying security vulnerabilities. However, prior methods, including sophisticated multi-turn approaches, often struggle to adapt to the evolving dynamics of dialogue as interactions progress. To address this challenge, we propose \ours (JailBreaking via \textbf{G}lobally \textbf{R}efining and \textbf{A}daptively \textbf{F}abricating), a novel multi-turn jailbreaking method that globally refines the attack trajectory at each interaction. In addition, we actively fabricate model responses to suppress safety-related warnings, thereby increasing the likelihood of eliciting harmful outputs in subsequent queries. Extensive experiments across six state-of-the-art LLMs demonstrate the superior effectiveness of our approach compared to existing single-turn and multi-turn jailbreaking methods. Our code will be released at https://github.com/Ytang520/Multi-Turn_jailbreaking_Global-Refinment_and_Active-Fabrication.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes