CL AIOct 26, 2023

ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

Zi Lin, Zihan Wang, Yongqi Tong, Yangkun Wang, Yuxin Guo, Yujia Wang, Jingbo Shang

arXiv:2310.17389v129.8253 citationsh-index: 10Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of maintaining non-toxic environments in user-AI interactions, which is critical for chatbot safety, but it is incremental as it focuses on benchmarking rather than proposing new detection methods.

The paper tackles the problem of toxicity detection in real-world user-AI conversations by introducing ToxicChat, a benchmark based on real user queries from an open-source chatbot, revealing significant domain differences and shortcomings of existing models when applied to this domain.

Despite remarkable advances that large language models have achieved in chatbots, maintaining a non-toxic user-AI interactive environment has become increasingly critical nowadays. However, previous efforts in toxicity detection have been mostly based on benchmarks derived from social media content, leaving the unique challenges inherent to real-world user-AI interactions insufficiently explored. In this work, we introduce ToxicChat, a novel benchmark based on real user queries from an open-source chatbot. This benchmark contains the rich, nuanced phenomena that can be tricky for current toxicity detection models to identify, revealing a significant domain difference compared to social media content. Our systematic evaluation of models trained on existing toxicity datasets has shown their shortcomings when applied to this unique domain of ToxicChat. Our work illuminates the potentially overlooked challenges of toxicity detection in real-world user-AI conversations. In the future, ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions.

View on arXiv PDF

Similar