CL AIMay 22, 2025

Breaking mBad! Supervised Fine-tuning for Cross-Lingual Detoxification

Himanshu Beniwal, Youngwoo Kim, Maarten Sap, Soham Dan, Thomas Hartvigsen

arXiv:2505.16722v34.92 citationsh-index: 8Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of ensuring toxicity-free LLMs for global applications, but it appears incremental as it builds on existing detoxification methods by extending them to cross-lingual contexts.

The paper tackles the problem of reducing toxicity in large language models across diverse languages by exploring cross-lingual detoxification, evaluating its effectiveness in 392 settings to assess toxicity reduction and trade-offs with model performance.

As large language models (LLMs) become increasingly prevalent in global applications, ensuring that they are toxicity-free across diverse linguistic contexts remains a critical challenge. We explore "Cross-lingual Detoxification", a cross-lingual paradigm that mitigates toxicity, enabling detoxification capabilities to transfer between high and low-resource languages across different script families. We analyze cross-lingual detoxification's effectiveness through 392 extensive settings to evaluate toxicity reduction in cross-distribution settings with limited data and investigate how mitigation impacts model performance on non-toxic tasks, revealing trade-offs between safety and knowledge preservation. Our code and dataset are publicly available at https://github.com/himanshubeniwal/Breaking-mBad.

View on arXiv PDF Code

Similar