CL AIDec 17, 2024

Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph

Yibo Zhao, Jiapeng Zhu, Can Xu, Yao Liu, Xiang Li

arXiv:2412.15268v46.113 citationsh-index: 4Has CodeACL

Originality Incremental advance

AI Analysis

This work addresses online content toxicity detection for social media platforms, offering an incremental improvement by enhancing existing LLM methods with domain-specific knowledge.

The paper tackled the problem of false negatives and false positives in LLM-based toxicity detection by proposing MetaTox, a method that uses a meta-toxic knowledge graph, resulting in a significant decrease in false positive rate and improved overall detection performance.

The rapid growth of social media platforms has raised significant concerns regarding online content toxicity. When Large Language Models (LLMs) are used for toxicity detection, two key challenges emerge: 1) the absence of domain-specific toxic knowledge leads to false negatives; 2) the excessive sensitivity of LLMs to toxic speech results in false positives, limiting freedom of speech. To address these issues, we propose a novel method called MetaTox, leveraging graph search on a meta-toxic knowledge graph to enhance hatred and toxicity detection. First, we construct a comprehensive meta-toxic knowledge graph by utilizing LLMs to extract toxic information through a three-step pipeline, with toxic benchmark datasets serving as corpora. Second, we query the graph via retrieval and ranking processes to supplement accurate, relevant toxic knowledge. Extensive experiments and in-depth case studies across multiple datasets demonstrate that our MetaTox significantly decreases the false positive rate while boosting overall toxicity detection performance. Our code is available at https://github.com/YiboZhao624/MetaTox.

View on arXiv PDF Code

Similar