CLFeb 8, 2021

A study of text representations in Hate Speech Detection

arXiv:2102.04521v18 citations
Originality Incremental advance
AI Analysis

This research addresses the problem of automatically detecting hate speech on social media platforms, which is crucial for content moderation and combating the spread of harmful language, primarily benefiting platform providers and regulatory bodies.

This study investigates various text representation techniques combined with multiple classification algorithms for automatic Hate Speech detection. The findings indicate that simple hate-keyword frequency features (BoW) perform best, followed by pre-trained word embeddings (GLoVe) and N-gram graphs (NGGs), with a combination of these representations achieving the best detection performance when paired with Logistic Regression or 3-layer neural network classifiers.

The pervasiveness of the Internet and social media have enabled the rapid and anonymous spread of Hate Speech content on microblogging platforms such as Twitter. Current EU and US legislation against hateful language, in conjunction with the large amount of data produced in these platforms has led to automatic tools being a necessary component of the Hate Speech detection task and pipeline. In this study, we examine the performance of several, diverse text representation techniques paired with multiple classification algorithms, on the automatic Hate Speech detection and abusive language discrimination task. We perform an experimental evaluation on binary and multiclass datasets, paired with significance testing. Our results show that simple hate-keyword frequency features (BoW) work best, followed by pre-trained word embeddings (GLoVe) as well as N-gram graphs (NGGs): a graph-based representation which proved to produce efficient, very low-dimensional but rich features for this task. A combination of these representations paired with Logistic Regression or 3-layer neural network classifiers achieved the best detection performance, in terms of micro and macro F-measure.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes