CLFeb 15, 2025
Evolving Hate Speech Online: An Adaptive Framework for Detection and MitigationShiza Ali, Jeremy Blackburn, Gianluca Stringhini
The proliferation of social media platforms has led to an increase in the spread of hate speech, particularly targeting vulnerable communities. Unfortunately, existing methods for automatically identifying and blocking toxic language rely on pre-constructed lexicons, making them reactive rather than adaptive. As such, these approaches become less effective over time, especially when new communities are targeted with slurs not included in the original datasets. To address this issue, we present an adaptive approach that uses word embeddings to update lexicons and develop a hybrid model that adjusts to emerging slurs and new linguistic patterns. This approach can effectively detect toxic language, including intentional spelling mistakes employed by aggressors to avoid detection. Our hybrid model, which combines BERT with lexicon-based techniques, achieves an accuracy of 95% for most state-of-the-art datasets. Our work has significant implications for creating safer online environments by improving the detection of toxic content and proactively updating the lexicon. Content Warning: This paper contains examples of hate speech that may be triggering.
CRDec 1, 2021
TROLLMAGNIFIER: Detecting State-Sponsored Troll Accounts on RedditMohammad Hammas Saeed, Shiza Ali, Jeremy Blackburn et al.
Growing evidence points to recurring influence campaigns on social media, often sponsored by state actors aiming to manipulate public opinion on sensitive political topics. Typically, campaigns are performed through instrumented accounts, known as troll accounts; despite their prominence, however, little work has been done to detect these accounts in the wild. In this paper, we present TROLLMAGNIFIER, a detection system for troll accounts. Our key observation, based on analysis of known Russian-sponsored troll accounts identified by Reddit, is that they show loose coordination, often interacting with each other to further specific narratives. Therefore, troll accounts controlled by the same actor often show similarities that can be leveraged for detection. TROLLMAGNIFIER learns the typical behavior of known troll accounts and identifies more that behave similarly. We train TROLLMAGNIFIER on a set of 335 known troll accounts and run it on a large dataset of Reddit accounts. Our system identifies 1,248 potential troll accounts; we then provide a multi-faceted analysis to corroborate the correctness of our classification. In particular, 66% of the detected accounts show signs of being instrumented by malicious actors (e.g., they were created on the same exact day as a known troll, they have since been suspended by Reddit, etc.). They also discuss similar topics as the known troll accounts and exhibit temporal synchronization in their activity. Overall, we show that using TROLLMAGNIFIER, one can grow the initial knowledge of potential trolls provided by Reddit by over 300%.