CLSep 23, 2018

Mind Your Language: Abuse and Offense Detection for Code-Switched Languages

arXiv:1809.08652v137 citations
AI Analysis

This addresses the problem of hate speech detection for users in multilingual societies like India, where code-switching is common, but it is incremental as it applies existing methods to a new language domain.

The paper tackles abuse and offense detection in Hinglish (Hindi-English code-switched language), a challenging task due to its non-fixed grammar and vocabulary, and achieves state-of-the-art performance with an LSTM-based model using transfer learning.

In multilingual societies like the Indian subcontinent, use of code-switched languages is much popular and convenient for the users. In this paper, we study offense and abuse detection in the code-switched pair of Hindi and English (i.e. Hinglish), the pair that is the most spoken. The task is made difficult due to non-fixed grammar, vocabulary, semantics and spellings of Hinglish language. We apply transfer learning and make a LSTM based model for hate speech classification. This model surpasses the performance shown by the current best models to establish itself as the state-of-the-art in the unexplored domain of Hinglish offensive text classification.We also release our model and the embeddings trained for research purposes

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes