CL CY LGJan 7, 2020

RECAST: Interactive Auditing of Automatic Toxicity Detection Models

Austin P. Wright, Omar Shaikh, Haekyu Park, Will Epperson, Muhammed Ahmed, Stephane Pinel, Diyi Yang, Duen Horng Chau

arXiv:2001.01819v27 citations

AI Analysis

This work addresses the need for better auditing tools for developers and users of toxicity detection systems, which is incremental as it builds on existing model examination techniques.

The paper tackles the problem of auditing automatic toxicity detection models by introducing RECAST, an interactive tool that visualizes prediction explanations and provides alternative wordings for toxic speech, addressing fairness, robustness, and explainability concerns.

As toxic language becomes nearly pervasive online, there has been increasing interest in leveraging the advancements in natural language processing (NLP), from very large transformer models to automatically detecting and removing toxic comments. Despite the fairness concerns, lack of adversarial robustness, and limited prediction explainability for deep learning systems, there is currently little work for auditing these systems and understanding how they work for both developers and users. We present our ongoing work, RECAST, an interactive tool for examining toxicity detection models by visualizing explanations for predictions and providing alternative wordings for detected toxic speech.

View on arXiv PDF

Similar