CLFeb 2

SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia

Panuthep Tasawong, Jian Gang Ngui, Alham Fikri Aji, Trevor Cohn, Peerat Limkonchotiwat

arXiv:2602.01618v10.6h-index: 36

Originality Incremental advance

AI Analysis

This addresses the need for culturally grounded AI safeguards in Southeast Asia, offering a scalable solution to a domain-specific problem with incremental improvements in data generation.

The paper tackled the problem of building culturally aware AI safeguards for Southeast Asia by developing a novel agentic data-generation framework to create authentic, region-specific safety datasets, resulting in SEA-Guard models that consistently outperform existing safeguards in detecting regionally sensitive content while maintaining general safety performance.

Culturally aware safeguards are crucial for AI alignment in real-world settings, where safety extends beyond common sense and encompasses diverse local values, norms, and region-specific regulations. However, building large-scale, culturally grounded datasets is challenging due to limited resources and a scarcity of native annotators. Consequently, many safeguard models rely on machine translation of English datasets, often missing regional and cultural nuances. We present a novel agentic data-generation framework to scalably create authentic, region-specific safety datasets for Southeast Asia (SEA). On this foundation, we introduce the SEA-Guard family, the first multilingual safeguard models grounded in SEA cultural contexts. Evaluated across multiple benchmarks and cultural variants, SEA-Guard consistently outperforms existing safeguards at detecting regionally sensitive or harmful content while maintaining strong general safety performance.

View on arXiv PDF

Similar