CLMay 27, 2025

Assessing and Refining ChatGPT's Performance in Identifying Targeting and Inappropriate Language: A Comparative Study

arXiv:2505.21710v1h-index: 12
Originality Synthesis-oriented
AI Analysis

It addresses the challenge of moderating user-generated content on social networks, offering incremental insights for enhancing automated content moderation systems.

This study evaluated ChatGPT's ability to identify targeting and inappropriate language in online comments, finding it performed well for inappropriate content with improved accuracy in Version 6, but showed variability and higher false positives in targeting detection compared to expert judgments.

This study evaluates the effectiveness of ChatGPT, an advanced AI model for natural language processing, in identifying targeting and inappropriate language in online comments. With the increasing challenge of moderating vast volumes of user-generated content on social network sites, the role of AI in content moderation has gained prominence. We compared ChatGPT's performance against crowd-sourced annotations and expert evaluations to assess its accuracy, scope of detection, and consistency. Our findings highlight that ChatGPT performs well in detecting inappropriate content, showing notable improvements in accuracy through iterative refinements, particularly in Version 6. However, its performance in targeting language detection showed variability, with higher false positive rates compared to expert judgments. This study contributes to the field by demonstrating the potential of AI models like ChatGPT to enhance automated content moderation systems while also identifying areas for further improvement. The results underscore the importance of continuous model refinement and contextual understanding to better support automated moderation and mitigate harmful online behavior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes