CLCYLGApr 22, 2024

RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

CMUDeepMindMicrosoft
arXiv:2404.14397v239 citationsh-index: 6AAAI
Originality Incremental advance
AI Analysis

This addresses the need for scalable safety evaluations in multilingual AI deployments, though it is incremental as it builds on existing toxicity detection efforts.

The paper tackles the problem of evaluating toxicity detection in multilingual large and small language models by introducing RTP-LX, a human-annotated dataset in 28 languages, and finds that while models achieve acceptable accuracy, they have low agreement with human judges and struggle with culturally-specific and subtle harmful content.

Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end, we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. RTP-LX follows participatory design practices, and a portion of the corpus is especially designed to detect culturally-specific toxic language. We evaluate 10 S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. We find that, although they typically score acceptably in terms of accuracy, they have low agreement with human judges when scoring holistically the toxicity of a prompt; and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content (e.g. microaggressions, bias). We release this dataset to contribute to further reduce harmful uses of these models and improve their safe deployment.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes