David Broniatowski

2papers

2 Papers

CLMay 8, 2020
Detecting East Asian Prejudice on Social Media

Bertie Vidgen, Austin Botelho, David Broniatowski et al.

The outbreak of COVID-19 has transformed societies across the world as governments tackle the health, economic and social costs of the pandemic. It has also raised concerns about the spread of hateful language and prejudice online, especially hostility directed against East Asia. In this paper we report on the creation of a classifier that detects and categorizes social media posts from Twitter into four classes: Hostility against East Asia, Criticism of East Asia, Meta-discussions of East Asian prejudice and a neutral class. The classifier achieves an F1 score of 0.83 across all four classes. We provide our final model (coded in Python), as well as a new 20,000 tweet training dataset used to make the classifier, two analyses of hashtags associated with East Asian prejudice and the annotation codebook. The classifier can be implemented by other researchers, assisting with both online content moderation processes and further research into the dynamics, prevalence and impact of East Asian prejudice online during this global pandemic.

CYApr 18, 2020
Automatically Characterizing Targeted Information Operations Through Biases Present in Discourse on Twitter

Autumn Toney, Akshat Pandey, Wei Guo et al.

This paper considers the problem of automatically characterizing overall attitudes and biases that may be associated with emerging information operations via artificial intelligence. Accurate analysis of these emerging topics usually requires laborious, manual analysis by experts to annotate millions of tweets to identify biases in new topics. We introduce extensions of the Word Embedding Association Test from Caliskan et al. to a new domain (Caliskan, 2017). Our practical and unsupervised method is used to quantify biases promoted in information operations. We validate our method using known information operation-related tweets from Twitter's Transparency Report. We perform a case study on the COVID-19 pandemic to evaluate our method's performance on non-labeled Twitter data, demonstrating its usability in emerging domains.