CYCLIRLGOct 8, 2019

Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas

arXiv:1910.03206v238 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the lack of research on social media analysis for the Rohingya crisis, aiming to help marginalized communities by detecting supportive speech, though it is incremental in applying existing methods to a new domain.

The paper tackles the problem of detecting supportive comments for the Rohingya refugee crisis on social media by constructing a classifier using active learning and a novel sampling strategy, achieving results on a corpus of 263,482 YouTube comments.

The Rohingya refugee crisis is one of the biggest humanitarian crises of modern times with more than 600,000 Rohingyas rendered homeless according to the United Nations High Commissioner for Refugees. While it has received sustained press attention globally, no comprehensive research has been performed on social media pertaining to this large evolving crisis. In this work, we construct a substantial corpus of YouTube video comments (263,482 comments from 113,250 users in 5,153 relevant videos) with an aim to analyze the possible role of AI in helping a marginalized community. Using a novel combination of multiple Active Learning strategies and a novel active sampling strategy based on nearest-neighbors in the comment-embedding space, we construct a classifier that can detect comments defending the Rohingyas among larger numbers of disparaging and neutral ones. We advocate that beyond the burgeoning field of hate-speech detection, automatic detection of \emph{help-speech} can lend voice to the voiceless people and make the internet safer for marginalized communities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes