"Is Hate Lost in Translation?": Evaluation of Multilingual LGBTQIA+ Hate Speech Detection
This addresses the challenge of detecting nuanced hate speech across languages for content moderation systems, though it is incremental in applying existing methods to new multilingual data.
This paper evaluated how well large language models detect LGBTQIA+ hate speech across multiple languages, finding that English had the highest performance while code-switched English-Tamil had the lowest, and that fine-tuning consistently improved results while machine translation yielded mixed outcomes.
This paper explores the challenges of detecting LGBTQIA+ hate speech of large language models across multiple languages, including English, Italian, Chinese and (code-switched) English-Tamil, examining the impact of machine translation and whether the nuances of hate speech are preserved across translation. We examine the hate speech detection ability of zero-shot and fine-tuned GPT. Our findings indicate that: (1) English has the highest performance and the code-switching scenario of English-Tamil being the lowest, (2) fine-tuning improves performance consistently across languages whilst translation yields mixed results. Through simple experimentation with original text and machine-translated text for hate speech detection along with a qualitative error analysis, this paper sheds light on the socio-cultural nuances and complexities of languages that may not be captured by automatic translation.