CLAug 19, 2021

How Hateful are Movies? A Study and Prediction on Movie Subtitles

arXiv:2108.10724v1671 citations
Originality Synthesis-oriented
AI Analysis

This work addresses hate speech detection in movies for content moderation, but it is incremental as it applies existing methods to a new domain.

The researchers tackled hate speech detection in movies by creating a new dataset of annotated movie subtitles and applying transfer learning from social media datasets. Their BERT model achieved a 77% macro-averaged F1-score, demonstrating the efficacy of this approach.

In this research, we investigate techniques to detect hate speech in movies. We introduce a new dataset collected from the subtitles of six movies, where each utterance is annotated either as hate, offensive or normal. We apply transfer learning techniques of domain adaptation and fine-tuning on existing social media datasets, namely from Twitter and Fox News. We evaluate different representations, i.e., Bag of Words (BoW), Bi-directional Long short-term memory (Bi-LSTM), and Bidirectional Encoder Representations from Transformers (BERT) on 11k movie subtitles. The BERT model obtained the best macro-averaged F1-score of 77%. Hence, we show that transfer learning from the social media domain is efficacious in classifying hate and offensive speech in movies through subtitles.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes