CL AIMay 18, 2025

Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression

Jingyu Peng, Maolin Wang, Nan Wang, Jiatong Li, Yuchen Li, Yuyang Ye, Wanyu Wang, Pengyue Jia, Kai Zhang, Xiangyu Zhao

arXiv:2505.13527v214.712 citationsh-index: 9

Originality Incremental advance

AI Analysis

This addresses a critical security problem for LLM users by exposing and exploiting weaknesses in safety alignment, though it is incremental as it builds on existing jailbreak techniques.

The paper tackled the vulnerability of LLM safety mechanisms to jailbreak attacks by introducing LogiBreak, a method that converts harmful prompts into formal logical expressions to exploit distributional gaps, achieving effective evasion across multilingual datasets.

Despite substantial advancements in aligning large language models (LLMs) with human values, current safety mechanisms remain susceptible to jailbreak attacks. We hypothesize that this vulnerability stems from distributional discrepancies between alignment-oriented prompts and malicious prompts. To investigate this, we introduce LogiBreak, a novel and universal black-box jailbreak method that leverages logical expression translation to circumvent LLM safety systems. By converting harmful natural language prompts into formal logical expressions, LogiBreak exploits the distributional gap between alignment data and logic-based inputs, preserving the underlying semantic intent and readability while evading safety constraints. We evaluate LogiBreak on a multilingual jailbreak dataset spanning three languages, demonstrating its effectiveness across various evaluation settings and linguistic contexts.

View on arXiv PDF

Similar