CLAIMay 20, 2025

Attributional Safety Failures in Large Language Models under Code-Mixed Perturbations

arXiv:2505.14469v17 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses safety concerns for users of LLMs in multilingual or code-mixed contexts, though it is incremental as it builds on existing safety research.

The study tackled the problem of large language models producing unsafe outputs from code-mixed prompts, finding increased susceptibility compared to monolingual English prompts, with insights into attribution shifts and cultural dimensions.

Recent advancements in LLMs have raised significant safety concerns, particularly when dealing with code-mixed inputs and outputs. Our study systematically investigates the increased susceptibility of LLMs to produce unsafe outputs from code-mixed prompts compared to monolingual English prompts. Utilizing explainability methods, we dissect the internal attribution shifts causing model's harmful behaviors. In addition, we explore cultural dimensions by distinguishing between universally unsafe and culturally-specific unsafe queries. This paper presents novel experimental insights, clarifying the mechanisms driving this phenomenon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes