An Effective, Robust and Fairness-aware Hate Speech Detection Framework
This work addresses the need for accurate, robust, and fair hate speech classification in online social networks, representing an incremental improvement over existing methods.
The paper tackles the problem of hate speech detection by proposing a framework that addresses data insufficiency, model uncertainty, robustness against attacks, and fairness, achieving state-of-the-art performance by outperforming eight methods in both no-attack and attack scenarios.
With the widespread online social networks, hate speeches are spreading faster and causing more damage than ever before. Existing hate speech detection methods have limitations in several aspects, such as handling data insufficiency, estimating model uncertainty, improving robustness against malicious attacks, and handling unintended bias (i.e., fairness). There is an urgent need for accurate, robust, and fair hate speech classification in online social networks. To bridge the gap, we design a data-augmented, fairness addressed, and uncertainty estimated novel framework. As parts of the framework, we propose Bidirectional Quaternion-Quasi-LSTM layers to balance effectiveness and efficiency. To build a generalized model, we combine five datasets collected from three platforms. Experiment results show that our model outperforms eight state-of-the-art methods under both no attack scenario and various attack scenarios, indicating the effectiveness and robustness of our model. We share our code along with combined dataset for better future research