CLAug 28, 2025

Specializing General-purpose LLM Embeddings for Implicit Hate Speech Detection across Datasets

arXiv:2508.20750v11 citationsh-index: 3Proceedings of the 2nd International Workshop on Diffusion of Harmful Content on Online Web
Originality Incremental advance
AI Analysis

This addresses the problem of detecting subtle hate speech for content moderation, but it is incremental as it applies existing fine-tuning methods to a specific domain.

The paper tackled the problem of detecting implicit hate speech, which is indirect and challenging to identify, by fine-tuning general-purpose LLM embeddings like Stella and E5, achieving state-of-the-art performance with improvements of up to 1.10 percentage points in-dataset and 20.35 percentage points cross-dataset in F1-macro score.

Implicit hate speech (IHS) is indirect language that conveys prejudice or hatred through subtle cues, sarcasm or coded terminology. IHS is challenging to detect as it does not include explicit derogatory or inflammatory words. To address this challenge, task-specific pipelines can be complemented with external knowledge or additional information such as context, emotions and sentiment data. In this paper, we show that, by solely fine-tuning recent general-purpose embedding models based on large language models (LLMs), such as Stella, Jasper, NV-Embed and E5, we achieve state-of-the-art performance. Experiments on multiple IHS datasets show up to 1.10 percentage points improvements for in-dataset, and up to 20.35 percentage points improvements in cross-dataset evaluation, in terms of F1-macro score.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes