CLDec 5, 2025

SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures

arXiv:2512.05501v1
Originality Synthesis-oriented
AI Analysis

This addresses the problem of underrepresented safety concerns in Southeast Asian languages for AI developers and users, though it is incremental as it extends existing benchmarking approaches to a new region.

The paper tackles the lack of linguistic and cultural diversity in AI safety evaluations by introducing SEA-SafeguardBench, a human-verified benchmark for Southeast Asian languages, showing that state-of-the-art LLMs and guardrails underperform on these scenarios compared to English.

Safeguard models help large language models (LLMs) detect and block harmful content, but most evaluations remain English-centric and overlook linguistic and cultural diversity. Existing multilingual safety benchmarks often rely on machine-translated English data, which fails to capture nuances in low-resource languages. Southeast Asian (SEA) languages are underrepresented despite the region's linguistic diversity and unique safety concerns, from culturally sensitive political speech to region-specific misinformation. Addressing these gaps requires benchmarks that are natively authored to reflect local norms and harm scenarios. We introduce SEA-SafeguardBench, the first human-verified safety benchmark for SEA, covering eight languages, 21,640 samples, across three subsets: general, in-the-wild, and content generation. The experimental results from our benchmark demonstrate that even state-of-the-art LLMs and guardrails are challenged by SEA cultural and harm scenarios and underperform when compared to English texts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes