CRAIAug 25, 2025

Stand on The Shoulders of Giants: Building JailExpert from Previous Attack Experience

arXiv:2508.19292v14 citationsh-index: 9Has CodeEMNLP
Originality Incremental advance
AI Analysis

This addresses the need for more effective security testing of LLMs, though it is incremental as it builds on existing jailbreak techniques.

The paper tackles the problem of inefficient and repetitive optimization in automated jailbreak attacks on large language models by proposing JailExpert, a framework that integrates past attack experiences through formal representation and semantic grouping. The results show JailExpert achieves a 17% average increase in attack success rate and 2.7 times improvement in efficiency compared to state-of-the-art methods.

Large language models (LLMs) generate human-aligned content under certain safety constraints. However, the current known technique ``jailbreak prompt'' can circumvent safety-aligned measures and induce LLMs to output malicious content. Research on Jailbreaking can help identify vulnerabilities in LLMs and guide the development of robust security frameworks. To circumvent the issue of attack templates becoming obsolete as models evolve, existing methods adopt iterative mutation and dynamic optimization to facilitate more automated jailbreak attacks. However, these methods face two challenges: inefficiency and repetitive optimization, as they overlook the value of past attack experiences. To better integrate past attack experiences to assist current jailbreak attempts, we propose the \textbf{JailExpert}, an automated jailbreak framework, which is the first to achieve a formal representation of experience structure, group experiences based on semantic drift, and support the dynamic updating of the experience pool. Extensive experiments demonstrate that JailExpert significantly improves both attack effectiveness and efficiency. Compared to the current state-of-the-art black-box jailbreak methods, JailExpert achieves an average increase of 17\% in attack success rate and 2.7 times improvement in attack efficiency. Our implementation is available at \href{https://github.com/xiZAIzai/JailExpert}{XiZaiZai/JailExpert}

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes