SIApr 21
Among Us: Language of Conspiracy Theorists on Mainstream RedditFrancesco Corso, Giuseppe Russo, Francesco Pierri et al.
The interaction between fringe subcultures and mainstream online communities poses significant challenges for understanding discourse on social media. In this work, we investigate whether users active in conspiracy-focused communities exhibit detectable linguistic signatures when participating in general-interest spaces, such as news, humor, or hobbyist forums. We analyze a large-scale longitudinal dataset of over 500 million comments spanning 10 years of Reddit activity, examining the communication patterns of these users across diverse social contexts independent of the topics they discuss. We show that these users exhibit distinctive linguistic patterns that enable machine learning models to reliably distinguish them from the general population within individual communities (averaging 87\% accuracy across more than 20 binary classification tasks). Crucially, no single aggregate model captures these patterns across communities, as community-specific models outperform global classifiers by up to 17 percentage points. This result suggests that while these users are distinct, their linguistic expression is dynamic and highly responsive to the social norms of the environment they inhabit. Our findings suggest the need for tailored interventions in online spaces, as linguistic signals associated with conspiracy and fringe subcultures vary across communities and cannot be effectively addressed by uniform detection or moderation strategies.
CYMar 7, 2025Code
Evaluating open-source Large Language Models for automated fact-checkingNicolo' Fontana, Francesco Corso, Enrico Zuccolotto et al.
The increasing prevalence of online misinformation has heightened the demand for automated fact-checking solutions. Large Language Models (LLMs) have emerged as potential tools for assisting in this task, but their effectiveness remains uncertain. This study evaluates the fact-checking capabilities of various open-source LLMs, focusing on their ability to assess claims with different levels of contextual information. We conduct three key experiments: (1) evaluating whether LLMs can identify the semantic relationship between a claim and a fact-checking article, (2) assessing models' accuracy in verifying claims when given a related fact-checking article, and (3) testing LLMs' fact-checking abilities when leveraging data from external knowledge sources such as Google and Wikipedia. Our results indicate that LLMs perform well in identifying claim-article connections and verifying fact-checked stories but struggle with confirming factual news, where they are outperformed by traditional fine-tuned models such as RoBERTa. Additionally, the introduction of external knowledge does not significantly enhance LLMs' performance, calling for more tailored approaches. Our findings highlight both the potential and limitations of LLMs in automated fact-checking, emphasizing the need for further refinements before they can reliably replace human fact-checkers.
SIApr 3
Simulating Online Social Media Conversations on Controversial Topics Using AI Agents Calibrated on Real-World DataElisa Composta, Nicolo' Fontana, Francesco Corso et al.
Online social networks offer a valuable lens to analyze both individual and collective phenomena. Researchers often use simulators to explore controlled scenarios, and the integration of Large Language Models (LLMs) makes these simulations more realistic by enabling agents to understand and generate natural language content. In this work, we investigate the behavior of LLM-based agents in a simulated microblogging social network. We initialize agents with realistic profiles calibrated on real-world online conversations from the 2022 Italian political election and extend an existing simulator by introducing mechanisms for opinion modeling. We examine how LLM agents simulate online conversations, interact with others, and evolve their opinions under different scenarios. Our results show that LLM agents generate coherent content, form connections, and build a realistic social network structure. However, their generated content displays less heterogeneity in tone and toxicity compared to real data. We also find that LLM-based opinion dynamics evolve over time in ways similar to traditional mathematical models. Varying parameter configurations produces no significant changes, indicating that simulations require more careful cognitive modeling at initialization to replicate human behavior more faithfully. Overall, we demonstrate the potential of LLMs for simulating user behavior in social environments, while also identifying key challenges in capturing heterogeneity and complex dynamics.
CYApr 3
Effects of Algorithmic Visibility on Conspiracy Communities: Reddit after Epstein's 'Suicide'Asja Attanasio, Francesco Corso, Gianmarco De Francisci Morales et al.
Following the death of Jeffrey Epstein, the subreddit r/conspiracy experienced a significant visibility shock that brought mainstream users into direct contact with established conspiracy narratives. In this work, we explore how large-scale surges in public attention reshape participation and discourse within online conspiracy communities. We ask whether a sudden increase in exposure changes who join r/conspiracy, how long they stay, and how they adapt linguistically, compared with users who arrive through organic discovery. Using a computational framework that combines toxicity scores, survival analysis, and lexical and semantic measures over a period of 12 months, we observe that mainstream visibility is is associated with patterns consistent with a selection mechanism rather than a simple amplifier. Users who join the conspiracy community during the arrest-period tend to show higher linguistic similarity to core users, especially regarding linguistic and thematic norms and showing more stable engagement over time. By contrast, users who arrive during the height of public visibility remain semantically distant from core discourse and participate more briefly. Overall, we find that mainstream visibility is connected with changes in audience size, community composition, and linguistic cohesion. However, incidental exposure during attention shocks does not typically produce durable, integrated community members. These results provide a more nuanced understanding of how external events and platform visibility influence the growth and evolution of conspiracy spaces, offering insights for the design of responsible and transparent recommendation systems.
CLNov 5, 2025
Do Androids Dream of Unseen Puppeteers? Probing for a Conspiracy Mindset in Large Language ModelsFrancesco Corso, Francesco Pierri, Gianmarco De Francisci Morales
In this paper, we investigate whether Large Language Models (LLMs) exhibit conspiratorial tendencies, whether they display sociodemographic biases in this domain, and how easily they can be conditioned into adopting conspiratorial perspectives. Conspiracy beliefs play a central role in the spread of misinformation and in shaping distrust toward institutions, making them a critical testbed for evaluating the social fidelity of LLMs. LLMs are increasingly used as proxies for studying human behavior, yet little is known about whether they reproduce higher-order psychological constructs such as a conspiratorial mindset. To bridge this research gap, we administer validated psychometric surveys measuring conspiracy mindset to multiple models under different prompting and conditioning strategies. Our findings reveal that LLMs show partial agreement with elements of conspiracy belief, and conditioning with socio-demographic attributes produces uneven effects, exposing latent demographic biases. Moreover, targeted prompts can easily shift model responses toward conspiratorial directions, underscoring both the susceptibility of LLMs to manipulation and the potential risks of their deployment in sensitive contexts. These results highlight the importance of critically evaluating the psychological dimensions embedded in LLMs, both to advance computational social science and to inform possible mitigation strategies against harmful uses.
CLMay 29, 2025
Evaluating AI capabilities in detecting conspiracy theories on YouTubeLeonardo La Rocca, Francesco Corso, Francesco Pierri
As a leading online platform with a vast global audience, YouTube's extensive reach also makes it susceptible to hosting harmful content, including disinformation and conspiracy theories. This study explores the use of open-weight Large Language Models (LLMs), both text-only and multimodal, for identifying conspiracy theory videos shared on YouTube. Leveraging a labeled dataset of thousands of videos, we evaluate a variety of LLMs in a zero-shot setting and compare their performance to a fine-tuned RoBERTa baseline. Results show that text-based LLMs achieve high recall but lower precision, leading to increased false positives. Multimodal models lag behind their text-only counterparts, indicating limited benefits from visual data integration. To assess real-world applicability, we evaluate the most accurate models on an unlabeled dataset, finding that RoBERTa achieves performance close to LLMs with a larger number of parameters. Our work highlights the strengths and limitations of current LLM-based approaches for online harmful content detection, emphasizing the need for more precise and robust systems.