CL AIApr 23, 2021

Comparative Analysis of Machine Learning and Deep Learning Algorithms for Detection of Online Hate Speech

arXiv:2108.01063v10.55 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for robust hate speech detection models for social media users, but it is incremental as it focuses on feature engineering improvements rather than a new paradigm.

The paper tackled the problem of detecting online hate speech by comparing machine learning and deep learning algorithms, achieving a best result of 0.9998 F1 score using roBERTa-based sentence embeddings with decision trees.

In the day and age of social media, users have become prone to online hate speech. Several attempts have been made to classify hate speech using machine learning but the state-of-the-art models are not robust enough for practical applications. This is attributed to the use of primitive NLP feature engineering techniques. In this paper, we explored various feature engineering techniques ranging from different embeddings to conventional NLP algorithms. We also experimented with combinations of different features. From our experimentation, we realized that roBERTa (robustly optimized BERT approach) based sentence embeddings classified using decision trees gives the best results of 0.9998 F1 score. In our paper, we concluded that BERT based embeddings give the most useful features for this problem and have the capacity to be made into a practical robust model.

View on arXiv PDF

Similar