CLAIApr 23, 2021

Comparative Analysis of Machine Learning and Deep Learning Algorithms for Detection of Online Hate Speech

arXiv:2108.01063v15 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for robust hate speech detection models for social media users, but it is incremental as it focuses on feature engineering improvements rather than a new paradigm.

The paper tackled the problem of detecting online hate speech by comparing machine learning and deep learning algorithms, achieving a best result of 0.9998 F1 score using roBERTa-based sentence embeddings with decision trees.

In the day and age of social media, users have become prone to online hate speech. Several attempts have been made to classify hate speech using machine learning but the state-of-the-art models are not robust enough for practical applications. This is attributed to the use of primitive NLP feature engineering techniques. In this paper, we explored various feature engineering techniques ranging from different embeddings to conventional NLP algorithms. We also experimented with combinations of different features. From our experimentation, we realized that roBERTa (robustly optimized BERT approach) based sentence embeddings classified using decision trees gives the best results of 0.9998 F1 score. In our paper, we concluded that BERT based embeddings give the most useful features for this problem and have the capacity to be made into a practical robust model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes