CL AIMar 29, 2025

UNITYAI-GUARD: Pioneering Toxicity Detection Across Low-Resource Indian Languages

Himanshu Beniwal, Reddybathuni Venkat, Rohit Kumar, Birudugadda Srivibhav, Daksh Jain, Pavan Doddi, Eshwar Dhande, Adithya Ananth, Kuldeep, Mayank Singh

arXiv:2503.23088v24.91 citationsh-index: 4EMNLP

Originality Synthesis-oriented

AI Analysis

It addresses content moderation for linguistically diverse regions, though it is incremental as it applies existing methods to new data.

The paper tackles toxicity detection in low-resource Indian languages by developing UnityAI-Guard, achieving an average F1-score of 84.23% across seven languages.

This work introduces UnityAI-Guard, a framework for binary toxicity classification targeting low-resource Indian languages. While existing systems predominantly cater to high-resource languages, UnityAI-Guard addresses this critical gap by developing state-of-the-art models for identifying toxic content across diverse Brahmic/Indic scripts. Our approach achieves an impressive average F1-score of 84.23% across seven languages, leveraging a dataset of 567k training instances and 30k manually verified test instances. By advancing multilingual content moderation for linguistically diverse regions, UnityAI-Guard also provides public API access to foster broader adoption and application.

View on arXiv PDF

Similar