CLAIMay 8, 2023

Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks

arXiv:2305.04446v1242 citations
Originality Incremental advance
AI Analysis

This work addresses the lag in Chinese toxic language detection by providing resources and benchmarks for fine-grained analysis, which is incremental but domain-specific.

The paper tackled the problem of detecting toxic language in Chinese by introducing a hierarchical taxonomy, a fine-grained dataset including indirect toxicity, and a benchmark incorporating lexical knowledge, demonstrating the effectiveness of their Toxic Knowledge Enhancement method.

The widespread dissemination of toxic online posts is increasingly damaging to society. However, research on detecting toxic language in Chinese has lagged significantly. Existing datasets lack fine-grained annotation of toxic types and expressions, and ignore the samples with indirect toxicity. In addition, it is crucial to introduce lexical knowledge to detect the toxicity of posts, which has been a challenge for researchers. In this paper, we facilitate the fine-grained detection of Chinese toxic language. First, we built Monitor Toxic Frame, a hierarchical taxonomy to analyze toxic types and expressions. Then, a fine-grained dataset ToxiCN is presented, including both direct and indirect toxic samples. We also build an insult lexicon containing implicit profanity and propose Toxic Knowledge Enhancement (TKE) as a benchmark, incorporating the lexical feature to detect toxic language. In the experimental stage, we demonstrate the effectiveness of TKE. After that, a systematic quantitative and qualitative analysis of the findings is given.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes