CR AIMay 16

New Wide-Net-Casting Jailbreak Attacks Risk Large Models

Qiuchi Xiang, Haoxuan Qu, Hossein Rahmani, Jun Liu

arXiv:2605.1712896.3

AI Analysis

For AI safety researchers, it reveals a previously overlooked high-risk attack scenario that requires new evaluation and defense strategies.

The paper identifies a new jailbreak scenario where an adversary queries a group of large models to elicit harmful outputs, achieving up to 100% success rate in experiments without additional safeguards.

Jailbreak attacks on large models have drawn growing attention due to their close ties to societal safety. This work identifies a practical yet unexplored jailbreak scenario, the wide-net-casting scenario, where an adversary can query a group of large models instead of a single one to elicit harmful outputs. Our analysis reveals substantial yet previously overlooked safety risks under this scenario. As a key part of our analysis, we further develop a novel jailbreak method tailored to the wide-net-casting scenario. With this tailored method, the jailbreak success rate can even reach 100\% in some experiments when targeting the large models without additional safeguards, exposing wide-net-casting as a distinct, high-risk scenario that warrants attention in future evaluation and defense research.

View on arXiv PDF

Similar