LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models

Ryutaro Oshima, Yuya Hosoda, Youji Iiguni

arXiv:2601.04654v11.2h-index: 13APSIPA

Originality Incremental advance

AI Analysis

This addresses the challenge of limited annotated hate speech datasets for training models to prevent harmful content exposure, though it is incremental in its approach.

The paper tackles the problem of automatically recognizing and censoring hate speech in audio by integrating ASR with LLMs, achieving a 58.6% masking accuracy for hate-related words, which outperforms previous baselines.

This paper proposes an automatic speech recognition (ASR) model for hate speech using large language models (LLMs). The proposed method integrates the encoder of the ASR model with the decoder of the LLMs, enabling simultaneous transcription and censorship tasks to prevent the exposure of harmful content. Instruction tuning of the LLM to mask hate-related words with specific tokens requires an annotated hate speech dataset, which is limited. We generate text samples using an LLM with the Chain-of-Thought (CoT) prompting technique guided by cultural context and examples and then convert them into speech samples using a text-to-speech (TTS) system. However, some of them contain non-hate speech samples with hate-related words, which degrades the censorship performance. This paper filters the samples which text classification models correctly label as hate content. By adjusting the threshold for the number of correct answer models, we can control the level of hate in the generated dataset, allowing us to train the LLMs through curriculum learning in a gradual manner. Experimental results show that the proposed method achieves a masking accuracy of 58.6\% for hate-related words, surpassing previous baselines. We also confirm that the curriculum training contributes to the efficiency of both transcription and censorship tasks.

View on arXiv PDF

Similar