Towards A Friendly Online Community: An Unsupervised Style Transfer Framework for Profanity Redaction
This addresses the issue of abusive content for social media platforms, but it is incremental as it builds on existing style transfer methods.
The paper tackles the problem of offensive language on social media by proposing an unsupervised style transfer framework to redact profanity while maintaining fluency and content, showing it outperforms other models in human evaluations and consistently performs well on all automatic metrics.
Offensive and abusive language is a pressing problem on social media platforms. In this work, we propose a method for transforming offensive comments, statements containing profanity or offensive language, into non-offensive ones. We design a RETRIEVE, GENERATE and EDIT unsupervised style transfer pipeline to redact the offensive comments in a word-restricted manner while maintaining a high level of fluency and preserving the content of the original text. We extensively evaluate our method's performance and compare it to previous style transfer models using both automatic metrics and human evaluations. Experimental results show that our method outperforms other models on human evaluations and is the only approach that consistently performs well on all automatic evaluation metrics.