CLOct 12, 2025

Unlocking LLM Safeguards for Low-Resource Languages via Reasoning and Alignment with Minimal Training Data

arXiv:2510.10677v11 citationsh-index: 3Has CodeProceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
Originality Highly original
AI Analysis

This addresses the need for effective and interpretable safeguards against malicious requests in low-resource languages, representing a strong specific gain rather than a broad foundational advance.

The paper tackles the problem of poor performance and lack of interpretability in LLM safeguards for low-resource languages by proposing ConsistentGuard, a reasoning-based method that achieves superior performance with only 1,000 training samples across six languages, outperforming larger models trained with more data.

Recent advances in LLMs have enhanced AI capabilities, but also increased the risk posed by malicious requests, highlighting the need for effective LLM safeguards to detect such queries. Existing approaches largely rely on classifier-based methods that lack interpretability and perform poorly on low-resource languages. To address these limitations, we propose ConsistentGuard, a novel reasoning-based multilingual safeguard, which enhances explainability via reasoning and boosts knowledge transfer between languages through alignment. With only 1,000 training samples, our method demonstrates superior performance on three datasets across six languages, outperforming larger models trained with significantly more data, and exhibits strong interpretability and generalization ability. We also contribute a multilingual benchmark extension and release our codes to support future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes