Multilingual Abusiveness Identification on Code-Mixed Social Media Text
This work addresses the underexplored issue of abusiveness detection in non-English social media, which is crucial for platform safety and user experience in diverse linguistic contexts.
The paper tackles the problem of identifying abusive content in multilingual, code-mixed social media text, specifically on the Indic languages Moj dataset, and proposes an approach that addresses challenges like code-mixing and transliteration.
Social Media platforms have been seeing adoption and growth in their usage over time. This growth has been further accelerated with the lockdown in the past year when people's interaction, conversation, and expression were limited physically. It is becoming increasingly important to keep the platform safe from abusive content for better user experience. Much work has been done on English social media content but text analysis on non-English social media is relatively underexplored. Non-English social media content have the additional challenges of code-mixing, transliteration and using different scripture in same sentence. In this work, we propose an approach for abusiveness identification on the multilingual Moj dataset which comprises of Indic languages. Our approach tackles the common challenges of non-English social media content and can be extended to other languages as well.