Cracking the Code: Enhancing Implicit Hate Speech Detection through Coding Classification
This work addresses the societal issue of online hate speech by enhancing detection of its implicit forms, though it appears incremental as it builds on existing LLM methods with a novel taxonomy.
The paper tackles the problem of detecting implicit hate speech, which is more subtle than explicit forms and challenging for existing methods, by introducing a new taxonomy of six encoding strategies (codetypes) and integrating them into detection using large language models, resulting in improved performance across Chinese and English datasets.
The internet has become a hotspot for hate speech (HS), threatening societal harmony and individual well-being. While automatic detection methods perform well in identifying explicit hate speech (ex-HS), they struggle with more subtle forms, such as implicit hate speech (im-HS). We tackle this problem by introducing a new taxonomy for im-HS detection, defining six encoding strategies named codetypes. We present two methods for integrating codetypes into im-HS detection: 1) prompting large language models (LLMs) directly to classify sentences based on generated responses, and 2) using LLMs as encoders with codetypes embedded during the encoding process. Experiments show that the use of codetypes improves im-HS detection in both Chinese and English datasets, validating the effectiveness of our approach across different languages.