SIApr 17
The Power of Social Norms: How Initial Responses to Toxicity Shape Conversations on TwitterAna Aleksandric, Mohit Singhal, Anne Groggel et al.
Online harassment and abusive language continue to be a growing concern on social media platforms. In this study, we explore the power of group dynamics to shape the toxicity of Twitter conversations. First, we examine how the presence of others in a conversation can potentially diffuse Twitter users' responsibility to address a toxic reply. Second, we examine whether the toxicity of the first direct reply to a toxic tweet in conversations establishes group norms for subsequent replies. By doing so, we outline users participating in the conversation before the first toxic reply and the tone of initial responses to a toxic reply as explanatory factors that affect whether others feel uninhibited to post their own abusive or derogatory replies. We test this premise by analyzing a random sample of more than 187K tweets belonging to ~ 9K conversations. This analysis of group dynamics is motivated by a larger body of scholarship on contagion of antisocial behavior and the power of establishing social norms that maintain rather than sanction toxicity. We find evidence that an increased number of users participating in the conversation before receiving a toxic tweet is negatively associated with the number of users who responded to the toxic reply in a non-toxic way. Furthermore, posting a toxic reply immediately after a toxic comment is negatively associated with users posting non-toxic replies and Twitter conversations becoming increasingly toxic. We argue that understanding how social media users respond to uncivil comments or abusive language reveals social norms as powerful social cues that can shape human behavior online.
SIJan 19
The Tag is the Signal: URL-Agnostic Credibility Scoring for Messages on TelegramYipeng Wang, Huy Gia Han Vu, Mohit Singhal
Telegram has become one of the leading platforms for disseminating misinformational messages. However, many existing pipelines still classify each message's credibility based on the reputation of its associated domain names or its lexical features. Such methods work well on traditional long-form news articles published by well-known sources, but high-risk posts on Telegram are short and URL-sparse, leading to failures for link-based and standard TF-IDF models. To this end, we propose the TAG2CRED pipeline, a method designed for such short, convoluted messages. Our model will directly score each post based on the tags assigned to the text. We designed a concise label system that covers the dimensions of theme, claim type, call to action, and evidence. The fine-tuned large language model (LLM) assigns tags to messages and then maps these tags to calibrated risk scores in the [0,1] interval through L2-regularized logistic regression. We evaluated 87,936 Telegram messages associated with Media Bias/Fact Check (MBFC), using URL masking and domain disjoint splits. The results showed that the ROC-AUC of the TAG2CRED model reached 0.871, the macro-F1 value was 0.787, and the Brier score was 0.167, outperforming the baseline TF-IDF (macro-F1 value 0.737, Brier score 0.248); at the same time, the number of features used in this model is much smaller, and the generalization ability on infrequent domains is stronger. The performance of the stacked ensemble model (TF-IDF + TAG2CRED + SBERT) was further improved over the baseline SBERT. ROC-AUC reached 0.901, and the macro-F1 value was 0.813 (Brier score 0.114). This indicates that style labels and lexical features may capture different but complementary dimensions of information risk.
CVMar 11, 2025
FairDeFace: Evaluating the Fairness and Adversarial Robustness of Face Obfuscation MethodsSeyyed Mohammad Sadegh Moosavi Khorzooghi, Poojitha Thota, Mohit Singhal et al.
The lack of a common platform and benchmark datasets for evaluating face obfuscation methods has been a challenge, with every method being tested using arbitrary experiments, datasets, and metrics. While prior work has demonstrated that face recognition systems exhibit bias against some demographic groups, there exists a substantial gap in our understanding regarding the fairness of face obfuscation methods. Providing fair face obfuscation methods can ensure equitable protection across diverse demographic groups, especially since they can be used to preserve the privacy of vulnerable populations. To address these gaps, this paper introduces a comprehensive framework, named FairDeFace, designed to assess the adversarial robustness and fairness of face obfuscation methods. The framework introduces a set of modules encompassing data benchmarks, face detection and recognition algorithms, adversarial models, utility detection models, and fairness metrics. FairDeFace serves as a versatile platform where any face obfuscation method can be integrated, allowing for rigorous testing and comparison with other state-of-the-art methods. In its current implementation, FairDeFace incorporates 6 attacks, and several privacy, utility and fairness metrics. Using FairDeFace, and by conducting more than 500 experiments, we evaluated and compared the adversarial robustness of seven face obfuscation methods. This extensive analysis led to many interesting findings both in terms of the degree of robustness of existing methods and their biases against some gender or racial groups. FairDeFace also uses visualization of focused areas for both obfuscation and verification attacks to show not only which areas are mostly changed in the obfuscation process for some demographics, but also why they failed through focus area comparison of obfuscation and verification.
CYOct 23, 2021
Cybersecurity Misinformation Detection on Social Media: Case Studies on Phishing Reports and Zoom's ThreatsMohit Singhal, Nihal Kumarswamy, Shreyasi Kinhekar et al.
Prior work has extensively studied misinformation related to news, politics, and health, however, misinformation can also be about technological topics. While less controversial, such misinformation can severely impact companies' reputations and revenues, and users' online experiences. Recently, social media has also been increasingly used as a novel source of knowledgebase for extracting timely and relevant security threats, which are fed to the threat intelligence systems for better performance. However, with possible campaigns spreading false security threats, these systems can become vulnerable to poisoning attacks. In this work, we proposed novel approaches for detecting misinformation about cybersecurity and privacy threats on social media, focusing on two topics with different types of misinformation: phishing websites and Zoom's security & privacy threats. We developed a framework for detecting inaccurate phishing claims on Twitter. Using this framework, we could label about 9% of URLs and 22% of phishing reports as misinformation. We also proposed another framework for detecting misinformation related to Zoom's security and privacy threats on multiple platforms. Our classifiers showed great performance with more than 98% accuracy. Employing these classifiers on the posts from Facebook, Instagram, Reddit, and Twitter, we found respectively that about 18%, 3%, 4%, and 3% of posts were misinformation. In addition, we studied the characteristics of misinformation posts, their authors, and their timelines, which helped us identify campaigns.