Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer
This addresses the problem of offensive content for social media users and platforms, but it is incremental as it builds on existing text style transfer techniques.
The paper tackles offensive language on social media by using unsupervised text style transfer to convert offensive sentences into non-offensive ones, showing that their method outperforms a state-of-the-art system in two out of three metrics on Twitter and Reddit data.
We introduce a new approach to tackle the problem of offensive language in online social media. Our approach uses unsupervised text style transfer to translate offensive sentences into non-offensive ones. We propose a new method for training encoder-decoders using non-parallel data that combines a collaborative classifier, attention and the cycle consistency loss. Experimental results on data from Twitter and Reddit show that our method outperforms a state-of-the-art text style transfer system in two out of three quantitative metrics and produces reliable non-offensive transferred sentences.