CL AINov 18, 2024

ToxiLab: How Well Do Open-Source LLMs Generate Synthetic Toxicity Data?

Zheng Hui, Zhaoxiao Guo, Hang Zhao, Juanyong Duan, Lin Ai, Yinheng Li, Julia Hirschberg, Congrui Huang

arXiv:2411.15175v42.74 citationsh-index: 8Has Code

Originality Synthesis-oriented

AI Analysis

It addresses the need for scalable and cost-effective data augmentation for toxic content detection models, though it is incremental in exploring existing methods on new data.

This study tackled the problem of generating synthetic toxicity data for hate speech detection by evaluating six open-source LLMs, finding that Mistral consistently outperformed others and supervised fine-tuning significantly improved data reliability and diversity.

Effective toxic content detection relies heavily on high-quality and diverse data, which serve as the foundation for robust content moderation models. Synthetic data has become a common approach for training models across various NLP tasks. However, its effectiveness remains uncertain for highly subjective tasks like hate speech detection, with previous research yielding mixed results. This study explores the potential of open-source LLMs for harmful data synthesis, utilizing controlled prompting and supervised fine-tuning techniques to enhance data quality and diversity. We systematically evaluated 6 open source LLMs on 5 datasets, assessing their ability to generate diverse, high-quality harmful data while minimizing hallucination and duplication. Our results show that Mistral consistently outperforms other open models, and supervised fine-tuning significantly enhances data reliability and diversity. We further analyze the trade-offs between prompt-based vs. fine-tuned toxic data synthesis, discuss real-world deployment challenges, and highlight ethical considerations. Our findings demonstrate that fine-tuned open source LLMs provide scalable and cost-effective solutions to augment toxic content detection datasets, paving the way for more accessible and transparent content moderation tools.

View on arXiv PDF

Similar