CR AIAug 25, 2025

Stand on The Shoulders of Giants: Building JailExpert from Previous Attack Experience

Xi Wang, Songlei Jian, Shasha Li, Xiaopeng Li, Bin Ji, Jun Ma, Xiaodong Liu, Jing Wang, Feilong Bao, Jianfeng Zhang, Baosheng Wang, Jie Yu

arXiv:2508.19292v14 citationsh-index: 9Has CodeEMNLP

Originality Incremental advance

AI Analysis

This addresses the need for more effective security testing of LLMs, though it is incremental as it builds on existing jailbreak techniques.

The paper tackles the problem of inefficient and repetitive optimization in automated jailbreak attacks on large language models by proposing JailExpert, a framework that integrates past attack experiences through formal representation and semantic grouping. The results show JailExpert achieves a 17% average increase in attack success rate and 2.7 times improvement in efficiency compared to state-of-the-art methods.

Large language models (LLMs) generate human-aligned content under certain safety constraints. However, the current known technique ``jailbreak prompt'' can circumvent safety-aligned measures and induce LLMs to output malicious content. Research on Jailbreaking can help identify vulnerabilities in LLMs and guide the development of robust security frameworks. To circumvent the issue of attack templates becoming obsolete as models evolve, existing methods adopt iterative mutation and dynamic optimization to facilitate more automated jailbreak attacks. However, these methods face two challenges: inefficiency and repetitive optimization, as they overlook the value of past attack experiences. To better integrate past attack experiences to assist current jailbreak attempts, we propose the \textbf{JailExpert}, an automated jailbreak framework, which is the first to achieve a formal representation of experience structure, group experiences based on semantic drift, and support the dynamic updating of the experience pool. Extensive experiments demonstrate that JailExpert significantly improves both attack effectiveness and efficiency. Compared to the current state-of-the-art black-box jailbreak methods, JailExpert achieves an average increase of 17\% in attack success rate and 2.7 times improvement in attack efficiency. Our implementation is available at \href{https://github.com/xiZAIzai/JailExpert}{XiZaiZai/JailExpert}

View on arXiv PDF Code

Similar