Alessandro Flammini

SI
h-index65
16papers
2,907citations
Novelty38%
AI Score44

16 Papers

SIMay 24, 2018
The spread of low-credibility content by social bots

Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol et al.

The massive spread of digital misinformation has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of misinformation online and to develop solutions, while search and social media platforms are beginning to deploy countermeasures. With few exceptions, these efforts have been mainly informed by anecdotal evidence rather than systematic data. Here we analyze 14 million messages spreading 400 thousand articles on Twitter during and following the 2016 U.S. presidential campaign and election. We find evidence that social bots played a disproportionate role in amplifying low-credibility content. Accounts that actively spread articles from low-credibility sources are significantly more likely to be bots. Automated accounts are particularly active in amplifying content in the very early spreading moments, before an article goes viral. Bots also target users with many followers through replies and mentions. Humans are vulnerable to this manipulation, retweeting bots who post links to low-credibility content. Successful low-credibility sources are heavily supported by social bots. These results suggest that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.

CYJul 14, 2017
How algorithmic popularity bias hinders or promotes quality

Azadeh Nematzadeh, Giovanni Luca Ciampaglia, Filippo Menczer et al.

Algorithms that favor popular items are used to help us select among many choices, from engaging articles on a social media news feed to songs and books that others have purchased, and from top-raked search engine results to highly-cited scientific papers. The goal of these algorithms is to identify high-quality items such as reliable news, beautiful movies, prestigious information sources, and important discoveries --- in short, high-quality content should rank at the top. Prior work has shown that choosing what is popular may amplify random fluctuations and ultimately lead to sub-optimal rankings. Nonetheless, it is often assumed that recommending what is popular will help high-quality content "bubble up" in practice. Here we identify the conditions in which popularity may be a viable proxy for quality content by studying a simple model of cultural market endowed with an intrinsic notion of quality. A parameter representing the cognitive cost of exploration controls the critical trade-off between quality and popularity. We find a regime of intermediate exploration cost where an optimal balance exists, such that choosing what is popular actually promotes high-quality items to the top. Outside of these limits, however, popularity bias is more likely to hinder quality. These findings clarify the effects of algorithmic popularity bias on quality outcomes, and may inform the design of more principled mechanisms for techno-social cultural markets.

CYAug 25, 2020
Social Influence and Unfollowing Accelerate the Emergence of Echo Chambers

Kazutoshi Sasahara, Wen Chen, Hao Peng et al.

While social media make it easy to connect with and access information from anyone, they also facilitate basic influence and unfriending mechanisms that may lead to segregated and polarized clusters known as "echo chambers." Here we study the conditions in which such echo chambers emerge by introducing a simple model of information sharing in online social networks with the two ingredients of influence and unfriending. Users can change both their opinions and social connections based on the information to which they are exposed through sharing. The model dynamics show that even with minimal amounts of influence and unfriending, the social network rapidly devolves into segregated, homogeneous communities. These predictions are consistent with empirical data from Twitter. Although our findings suggest that echo chambers are somewhat inevitable given the mechanisms at play in online social media, they also provide insights into possible mitigation strategies.

SIApr 21
Emergence of Stereotypes and Affective Polarization from Belief Network Dynamics

Ozgur Can Seckin, Rachith Aiyappa, Madalina Vlasceanu et al.

Our belief systems are shaped by social processes, such as observations and influence, and by cognitive processes, such as the drive for internal coherence. These processes steer how individual beliefs evolve and become connected. The resulting belief networks contain both causal and associative links, including spurious ones, such as stereotypes. Here, we develop an agent-based model of belief networks that demonstrates how two basic mechanisms -- social interaction and a drive for internal coherence -- can give rise to such stereotypes without any underlying reality. We further demonstrate how stereotypes, when coupled with shared group identity, can give rise to affective polarization, even in the absence of ideological conflicts.

SIApr 12
Israel-Hamas War on X: A Case Study of Coordinated Campaigns and Information Integrity

Tuğrulcan Elmas, Filipi Nascimento Silva, Manita Pote et al.

Coordinated campaigns on social media play a critical role in shaping crisis information environments, particularly during the onset of conflicts when uncertainty is high and verified information is scarce. We study the interplay between coordinated campaigns and information integrity through a case study of the 2023 Israel-Hamas War on Twitter (X). We analyze 4.5~million tweets and employ established coordination detection methods to identify 11 coordinated groups involving 541 accounts. We characterize these groups through a multimodal analysis that includes topics, account amplification, toxicity, emotional tone, visual themes, and misleading claims. Our analysis reveal that coordinated campaigns rely predominantly on low-complexity tactics, such as retweet amplification and copy-paste diffusion, and promote distinct narratives consistent with a fragmented manipulation landscape, without centralized control. Widely amplified misleading claims concentrate within just three of the identified coordinated groups; the remaining groups primarily engage in advocacy, religious solidarity, or humanitarian mobilization. Claim-level integrity, toxicity, and emotional signals are mutually uncorrelated: no single behavioral signal is a reliable proxy for the others. Targeting the most prolific spreaders of misleading content for moderation would be effective in reducing such content. However, targeting prolific amplifiers in general would not achieve the same mitigation effect. These findings suggest that evaluating coordination structures jointly with their specific content footprints is needed to effectively prioritize moderation interventions.

LGOct 25, 2024
Coordinated Reply Attacks in Influence Operations: Characterization and Detection

Manita Pote, Tuğrulcan Elmas, Alessandro Flammini et al.

Coordinated reply attacks are a tactic observed in online influence operations and other coordinated campaigns to support or harass targeted individuals, or influence them or their followers. Despite its potential to influence the public, past studies have yet to analyze or provide a methodology to detect this tactic. In this study, we characterize coordinated reply attacks in the context of influence operations on Twitter. Our analysis reveals that the primary targets of these attacks are influential people such as journalists, news media, state officials, and politicians. We propose two supervised machine-learning models, one to classify tweets to determine whether they are targeted by a reply attack, and one to classify accounts that reply to a targeted tweet to determine whether they are part of a coordinated attack. The classifiers achieve AUC scores of 0.88 and 0.97, respectively. These results indicate that accounts involved in reply attacks can be detected, and the targeted accounts themselves can serve as sensors for influence operation detection.

SIJun 11, 2020
Detection of Novel Social Bots by Ensembles of Specialized Classifiers

Mohsen Sayyadiharikandeh, Onur Varol, Kai-Cheng Yang et al.

Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion. While researchers have developed sophisticated methods to detect abuse, novel bots with diverse behaviors evade detection. We show that different types of bots are characterized by different behavioral features. As a result, supervised learning techniques suffer severe performance deterioration when attempting to detect behaviors not observed in the training data. Moreover, tuning these models to recognize novel bots requires retraining with a significant amount of new annotations, which are expensive to obtain. To address these issues, we propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule. The ensemble of specialized classifiers (ESC) can better generalize, leading to an average improvement of 56\% in F1 score for unseen accounts across datasets. Furthermore, novel bot behaviors are learned with fewer labeled examples during retraining. We deployed ESC in the newest version of Botometer, a popular tool to detect social bots in the wild, with a cross-validation AUC of 0.99.

DLNov 27, 2019
Recency predicts bursts in the evolution of author citations

Filipi Nascimento Silva, Aditya Tandon, Diego Raphael Amancio et al.

The citations process for scientific papers has been studied extensively. But while the citations accrued by authors are the sum of the citations of their papers, translating the dynamics of citation accumulation from the paper to the author level is not trivial. Here we conduct a systematic study of the evolution of author citations, and in particular their bursty dynamics. We find empirical evidence of a correlation between the number of citations most recently accrued by an author and the number of citations they receive in the future. Using a simple model where the probability for an author to receive new citations depends only on the number of citations collected in the previous 12-24 months, we are able to reproduce both the citation and burst size distributions of authors across multiple decades.

IRDec 22, 2017
RelSifter: Scoring Triples from Type-like Relations - The Samphire Triple Scorer at WSDM Cup 2017

Prashant Shiralkar, Mihai Avram, Giovanni Luca Ciampaglia et al.

We present RelSifter, a supervised learning approach to the problem of assigning relevance scores to triples expressing type-like relations such as 'profession' and 'nationality.' To provide additional contextual information about individuals and relations we supplement the data provided as part of the WSDM 2017 Triple Score contest with Wikidata and DBpedia, two large-scale knowledge graphs (KG). Our hypothesis is that any type relation, i.e., a specific profession like 'actor' or 'scientist,' can be described by the set of typical "activities" of people known to have that type relation. For example, actors are known to star in movies, and scientists are known for their academic affiliations. In a KG, this information is to be found on a properly defined subset of the second-degree neighbors of the type relation. This form of local information can be used as part of a learning algorithm to predict relevance scores for new, unseen triples. When scoring 'profession' and 'nationality' triples our experiments based on this approach result in an accuracy equal to 73% and 78%, respectively. These performance metrics are roughly equivalent or only slightly below the state of the art prior to the present contest. This suggests that our approach can be effective for evaluating facts, despite the skewness in the number of facts per individual mined from KGs.

AIAug 24, 2017
Finding Streams in Knowledge Graphs to Support Fact Checking

Prashant Shiralkar, Alessandro Flammini, Filippo Menczer et al.

The volume and velocity of information that gets generated online limits current journalistic practices to fact-check claims at the same rate. Computational approaches for fact checking may be the key to help mitigate the risks of massive misinformation spread. Such approaches can be designed to not only be scalable and effective at assessing veracity of dubious claims, but also to boost a human fact checker's productivity by surfacing relevant facts and patterns to aid their analysis. To this end, we present a novel, unsupervised network-flow based approach to determine the truthfulness of a statement of fact expressed in the form of a (subject, predicate, object) triple. We view a knowledge graph of background information about real-world entities as a flow network, and knowledge as a fluid, abstract commodity. We show that computational fact checking of such a triple then amounts to finding a "knowledge stream" that emanates from the subject node and flows toward the object node through paths connecting them. Evaluation on a range of real-world and hand-crafted datasets of facts related to entertainment, business, sports, geography and more reveals that this network-flow model can be very effective in discerning true statements from false ones, outperforming existing algorithms on many test cases. Moreover, the model is expressive in its ability to automatically discover several useful path patterns and surface relevant facts that may help a human fact checker corroborate or refute a claim.

SIOct 20, 2016
Information Overload in Group Communication: From Conversation to Cacophony in the Twitch Chat

Azadeh Nematzadeh, Giovanni Luca Ciampaglia, Yong-Yeol Ahn et al.

Online communication channels, especially social web platforms, are rapidly replacing traditional ones. Online platforms allow users to overcome physical barriers, enabling worldwide participation. However, the power of online communication bears an important negative consequence --- we are exposed to too much information to process. Too many participants, for example, can turn online public spaces into noisy, overcrowded fora where no meaningful conversation can be held. Here we analyze a large dataset of public chat logs from Twitch, a popular video streaming platform, in order to examine how information overload affects online group communication. We measure structural and textual features of conversations such as user output, interaction, and information content per message across a wide range of information loads. Our analysis reveals the existence of a transition from a conversational state to a cacophony --- a state of overload with lower user participation, more copy-pasted messages, and less information per message. These results hold both on average and at the individual level for the majority of users. This study provides a quantitative basis for further studies of the social effects of information overload, and may guide the design of more resilient online communication systems.

SIMay 2, 2016
Predicting online extremism, content adopters, and interaction reciprocity

Emilio Ferrara, Wen-Qiang Wang, Onur Varol et al.

We present a machine learning framework that leverages a mixture of metadata, network, and temporal features to detect extremist users, and predict content adopters and interaction reciprocity in social media. We exploit a unique dataset containing millions of tweets generated by more than 25 thousand users who have been manually identified, reported, and suspended by Twitter due to their involvement with extremist campaigns. We also leverage millions of tweets generated by a random sample of 25 thousand regular users who were exposed to, or consumed, extremist content. We carry out three forecasting tasks, (i) to detect extremist users, (ii) to estimate whether regular users will adopt extremist content, and finally (iii) to predict whether users will reciprocate contacts initiated by extremists. All forecasting tasks are set up in two scenarios: a post hoc (time independent) prediction task on aggregated data, and a simulated real-time prediction task. The performance of our framework is extremely promising, yielding in the different forecasting scenarios up to 93% AUC for extremist user detection, up to 80% AUC for content adoption prediction, and finally up to 72% AUC for interaction reciprocity forecasting. We conclude by providing a thorough feature analysis that helps determine which are the emerging signals that provide predictive power in different scenarios.

SIJan 20, 2016
The DARPA Twitter Bot Challenge

V. S. Subrahmanian, Amos Azaria, Skylar Durst et al.

A number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate "influence bots" - realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook - before they get too influential. Spurred by such events, DARPA held a 4-week competition in February/March 2015 in which multiple teams supported by the DARPA Social Media in Strategic Communications program competed to identify a set of previously identified "influence bots" serving as ground truth on a specific topic within Twitter. Past work regarding influence bots often has difficulty supporting claims about accuracy, since there is limited ground truth (though some exceptions do exist [3,7]). However, with the exception of [3], no past work has looked specifically at identifying influence bots on a specific topic. This paper describes the DARPA Challenge and describes the methods used by the three top-ranked teams.

SIFeb 20, 2015
On predictability of rare events leveraging social media: a machine learning perspective

Lei Le, Emilio Ferrara, Alessandro Flammini

Information extracted from social media streams has been leveraged to forecast the outcome of a large number of real-world events, from political elections to stock market fluctuations. An increasing amount of studies demonstrates how the analysis of social media conversations provides cheap access to the wisdom of the crowd. However, extents and contexts in which such forecasting power can be effectively leveraged are still unverified at least in a systematic way. It is also unclear how social-media-based predictions compare to those based on alternative information sources. To address these issues, here we develop a machine learning framework that leverages social media streams to automatically identify and predict the outcomes of soccer matches. We focus in particular on matches in which at least one of the possible outcomes is deemed as highly unlikely by professional bookmakers. We argue that sport events offer a systematic approach for testing the predictive power of social media, and allow to compare such power against the rigorous baselines set by external sources. Despite such strict baselines, our framework yields above 8% marginal profit when used to inform simple betting strategies. The system is based on real-time sentiment analysis and exploits data collected immediately before the games, allowing for informed bets. We discuss the rationale behind our approach, describe the learning framework, its prediction performance and the return it provides as compared to a set of betting strategies. To test our framework we use both historical Twitter data from the 2014 FIFA World Cup games, and real-time Twitter data collected by monitoring the conversations about all soccer matches of four major European tournaments (FA Premier League, Serie A, La Liga, and Bundesliga), and the 2014 UEFA Champions League, during the period between Oct. 25th 2014 and Nov. 26th 2014.

SINov 3, 2014
Clustering memes in social media streams

Mohsen JafariAsbagh, Emilio Ferrara, Onur Varol et al.

The problem of clustering content in social media has pervasive applications, including the identification of discussion topics, event detection, and content recommendation. Here we describe a streaming framework for online detection and clustering of memes in social media, specifically Twitter. A pre-clustering procedure, namely protomeme detection, first isolates atomic tokens of information carried by the tweets. Protomemes are thereafter aggregated, based on multiple similarity measures, to obtain memes as cohesive groups of tweets reflecting actual concepts or topics of discussion. The clustering algorithm takes into account various dimensions of the data and metadata, including natural language, the social network, and the patterns of information diffusion. As a result, our system can build clusters of semantically, structurally, and topically related tweets. The clustering process is based on a variant of Online K-means that incorporates a memory mechanism, used to "forget" old memes and replace them over time with the new ones. The evaluation of our framework is carried out by using a dataset of Twitter trending topics. Over a one-week period, we systematically determined whether our algorithm was able to recover the trending hashtags. We show that the proposed method outperforms baseline algorithms that only use content features, as well as a state-of-the-art event detection method that assumes full knowledge of the underlying follower network. We finally show that our online learning framework is flexible, due to its independence of the adopted clustering algorithm, and best suited to work in a streaming scenario.

SIMay 4, 2012
Partisan Asymmetries in Online Political Activity

Michael D. Conover, Bruno Gonçalves, Alessandro Flammini et al.

We examine partisan differences in the behavior, communication patterns and social interactions of more than 18,000 politically-active Twitter users to produce evidence that points to changing levels of partisan engagement with the American online political landscape. Analysis of a network defined by the communication activity of these users in proximity to the 2010 midterm congressional elections reveals a highly segregated, well clustered partisan community structure. Using cluster membership as a high-fidelity (87% accuracy) proxy for political affiliation, we characterize a wide range of differences in the behavior, communication and social connectivity of left- and right-leaning Twitter users. We find that in contrast to the online political dynamics of the 2008 campaign, right-leaning Twitter users exhibit greater levels of political activity, a more tightly interconnected social structure, and a communication network topology that facilitates the rapid and broad dissemination of political information.