LGJun 24, 2021

Hate Speech Detection in Clubhouse

Hadi Mansourifar, Dana Alsagheer, Reza Fathi, Weidong Shi, Lan Ni, Yan Huang

arXiv:2106.13238v31.63 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This addresses hate speech moderation for social media platforms, but it is incremental as it applies existing methods to a new data source.

The paper tackles hate speech detection in Clubhouse voice chat rooms by collecting the first dataset from this platform and showing that Google Perspective Scores outperform Bag of Words and Word2Vec as text features in experiments.

With the rise of voice chat rooms, a gigantic resource of data can be exposed to the research community for natural language processing tasks. Moderators in voice chat rooms actively monitor the discussions and remove the participants with offensive language. However, it makes the hate speech detection even more difficult since some participants try to find creative ways to articulate hate speech. This makes the hate speech detection challenging in new social media like Clubhouse. To the best of our knowledge all the hate speech datasets have been collected from text resources like Twitter. In this paper, we take the first step to collect a significant dataset from Clubhouse as the rising star in social media industry. We analyze the collected instances from statistical point of view using the Google Perspective Scores. Our experiments show that, the Perspective Scores can outperform Bag of Words and Word2Vec as high level text features.

View on arXiv PDF Code

Similar