SD ASJun 4

SpeechJBB: Probing Safety Alignment and Comprehension in Large Audio Language Models under Code-Switched Speech

Virginia Ceccatelli, Yejin Jeon, David Ifeoluwa Adelani

arXiv:2606.0603717.0

Predicted impact top 32% in SD · last 90 daysOriginality Incremental advance

AI Analysis

For developers and deployers of LALMs, this work exposes critical safety weaknesses in multilingual and spoken settings that are not captured by existing text-based benchmarks.

SpeechJBB reveals that large audio language models (LALMs) are highly vulnerable to jailbreak attacks via code-switched speech, achieving high jailbreak success rates (JSR), with non-English and code-switched pairs being most effective. Pseudo-word insertion further increases attack success by reducing refusal rates.

Large audio language models (LALMs) are increasingly deployed in real-world applications, yet their safety alignment is still primarily evaluated on monolingual, text-based harmful prompts. This leaves their generalizability under multilingual and spoken settings, particularly code-switched speech, largely underexplored. To address this gap, we introduce SpeechJBB, an audio jailbreak dataset for benchmarking across multiple state-of-the-art LALMs. The extent of safety weaknesses is further probed by introducing an augmented setting where phonologically plausible pseudo-words are inserted around safety-critical terms to simulate localized obfuscation. Across models, code-switched harmful audio yields substantially high jailbreak success rates (JSR), with non-English monolingual and non-English code-switched pairs exhibiting the highest attack success. Pseudo-word insertion further reduces refusal rates, which demonstrates that natural-sounding obfuscation can effectively bypass safety policies.

View on arXiv PDF

Similar