ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Abuse Detection in Conversational AI
This work addresses the problem of nuanced abuse detection for conversational AI developers, but it is incremental as it focuses on data collection and benchmarking without introducing new methods.
The study tackled the problem of detecting abusive language in conversational AI by creating the first English corpus from real-world interactions with three AI systems, finding that abuse distribution differs significantly from existing datasets with more sexually tinted aggression. They benchmarked existing models, revealing F1 scores below 90%, indicating substantial room for improvement.
We present the first English corpus study on abusive language towards three conversational AI systems gathered "in the wild": an open-domain social bot, a rule-based chatbot, and a task-based system. To account for the complexity of the task, we take a more `nuanced' approach where our ConvAI dataset reflects fine-grained notions of abuse, as well as views from multiple expert annotators. We find that the distribution of abuse is vastly different compared to other commonly used datasets, with more sexually tinted aggression towards the virtual persona of these systems. Finally, we report results from bench-marking existing models against this data. Unsurprisingly, we find that there is substantial room for improvement with F1 scores below 90%.