CLAIDec 10, 2025

System Report for CCL25-Eval Task 10: Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection

arXiv:2512.09563v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses the urgent societal problem of hate speech on Chinese social media for content moderators, though it is incremental as it builds on existing LLM methods.

The paper tackled fine-grained Chinese hate speech detection by proposing a three-stage LLM-based framework involving prompt engineering, supervised fine-tuning, and LLM merging, achieving superior performance over baselines on the STATE-ToxiCN benchmark.

The proliferation of hate speech on Chinese social media poses urgent societal risks, yet traditional systems struggle to decode context-dependent rhetorical strategies and evolving slang. To bridge this gap, we propose a novel three-stage LLM-based framework: Prompt Engineering, Supervised Fine-tuning, and LLM Merging. First, context-aware prompts are designed to guide LLMs in extracting implicit hate patterns. Next, task-specific features are integrated during supervised fine-tuning to enhance domain adaptation. Finally, merging fine-tuned LLMs improves robustness against out-of-distribution cases. Evaluations on the STATE-ToxiCN benchmark validate the framework's effectiveness, demonstrating superior performance over baseline methods in detecting fine-grained hate speech.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes