CLMay 15, 2024

PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models

Devansh Jain, Priyanshu Kumar, Samuel Gehman, Xuhui Zhou, Thomas Hartvigsen, Maarten Sap

AI2CMU

arXiv:2405.09373v310.420 citationsh-index: 49Has Code

Originality Incremental advance

AI Analysis

This addresses safety risks in deploying LLMs globally by providing a multilingual benchmark, though it is incremental as it extends existing evaluation frameworks to more languages.

The authors tackled the lack of multilingual toxicity evaluation for large language models by introducing PolygloToxicityPrompts, a benchmark of 425K prompts across 17 languages, and found that toxicity increases with larger model sizes or lower language resources, while tuning methods reduce toxicity but with no significant difference between preference-tuning methods.

Recent advances in large language models (LLMs) have led to their extensive global deployment, and ensuring their safety calls for comprehensive and multilingual toxicity evaluations. However, existing toxicity benchmarks are overwhelmingly focused on English, posing serious risks to deploying LLMs in other languages. We address this by introducing PolygloToxicityPrompts (PTP), the first large-scale multilingual toxicity evaluation benchmark of 425K naturally occurring prompts spanning 17 languages. We overcome the scarcity of naturally occurring toxicity in web-text and ensure coverage across languages with varying resources by automatically scraping over 100M web-text documents. Using PTP, we investigate research questions to study the impact of model size, prompt language, and instruction and preference-tuning methods on toxicity by benchmarking over 60 LLMs. Notably, we find that toxicity increases as language resources decrease or model size increases. Although instruction- and preference-tuning reduce toxicity, the choice of preference-tuning method does not have any significant impact. Our findings shed light on crucial shortcomings of LLM safeguarding and highlight areas for future research.

View on arXiv PDF Code

Similar