Giovanni Luca Ciampaglia

SI
13papers
2,114citations
Novelty41%
AI Score46

13 Papers

SIMay 24, 2018
The spread of low-credibility content by social bots

Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol et al.

The massive spread of digital misinformation has been identified as a major global risk and has been alleged to influence elections and threaten democracies. Communication, cognitive, social, and computer scientists are engaged in efforts to study the complex causes for the viral diffusion of misinformation online and to develop solutions, while search and social media platforms are beginning to deploy countermeasures. With few exceptions, these efforts have been mainly informed by anecdotal evidence rather than systematic data. Here we analyze 14 million messages spreading 400 thousand articles on Twitter during and following the 2016 U.S. presidential campaign and election. We find evidence that social bots played a disproportionate role in amplifying low-credibility content. Accounts that actively spread articles from low-credibility sources are significantly more likely to be bots. Automated accounts are particularly active in amplifying content in the very early spreading moments, before an article goes viral. Bots also target users with many followers through replies and mentions. Humans are vulnerable to this manipulation, retweeting bots who post links to low-credibility content. Successful low-credibility sources are heavily supported by social bots. These results suggest that curbing social bots may be an effective strategy for mitigating the spread of online misinformation.

CLOct 8, 2023
Factuality Challenges in the Era of Large Language Models

Isabelle Augenstein, Timothy Baldwin, Meeyoung Cha et al.

The emergence of tools based on Large Language Models (LLMs), such as OpenAI's ChatGPT, Microsoft's Bing Chat, and Google's Bard, has garnered immense public attention. These incredibly useful, natural-sounding tools mark significant advances in natural language generation, yet they exhibit a propensity to generate false, erroneous, or misleading content -- commonly referred to as "hallucinations." Moreover, LLMs can be exploited for malicious applications, such as generating false but credible-sounding content and profiles at scale. This poses a significant challenge to society in terms of the potential deception of users and the increasing dissemination of inaccurate information. In light of these risks, we explore the kinds of technological innovations, regulatory reforms, and AI literacy initiatives needed from fact-checkers, news organizations, and the broader research and policy communities. By identifying the risks, the imminent threats, and some viable solutions, we seek to shed light on navigating various aspects of veracity in the era of generative AI.

CYJul 14, 2017
How algorithmic popularity bias hinders or promotes quality

Azadeh Nematzadeh, Giovanni Luca Ciampaglia, Filippo Menczer et al.

Algorithms that favor popular items are used to help us select among many choices, from engaging articles on a social media news feed to songs and books that others have purchased, and from top-raked search engine results to highly-cited scientific papers. The goal of these algorithms is to identify high-quality items such as reliable news, beautiful movies, prestigious information sources, and important discoveries --- in short, high-quality content should rank at the top. Prior work has shown that choosing what is popular may amplify random fluctuations and ultimately lead to sub-optimal rankings. Nonetheless, it is often assumed that recommending what is popular will help high-quality content "bubble up" in practice. Here we identify the conditions in which popularity may be a viable proxy for quality content by studying a simple model of cultural market endowed with an intrinsic notion of quality. A parameter representing the cognitive cost of exploration controls the critical trade-off between quality and popularity. We find a regime of intermediate exploration cost where an optimal balance exists, such that choosing what is popular actually promotes high-quality items to the top. Outside of these limits, however, popularity bias is more likely to hinder quality. These findings clarify the effects of algorithmic popularity bias on quality outcomes, and may inform the design of more principled mechanisms for techno-social cultural markets.

CYAug 25, 2020
Social Influence and Unfollowing Accelerate the Emergence of Echo Chambers

Kazutoshi Sasahara, Wen Chen, Hao Peng et al.

While social media make it easy to connect with and access information from anyone, they also facilitate basic influence and unfriending mechanisms that may lead to segregated and polarized clusters known as "echo chambers." Here we study the conditions in which such echo chambers emerge by introducing a simple model of information sharing in online social networks with the two ingredients of influence and unfriending. Users can change both their opinions and social connections based on the information to which they are exposed through sharing. The model dynamics show that even with minimal amounts of influence and unfriending, the social network rapidly devolves into segregated, homogeneous communities. These predictions are consistent with empirical data from Twitter. Although our findings suggest that echo chambers are somewhat inevitable given the mechanisms at play in online social media, they also provide insights into possible mitigation strategies.

30.8HCApr 13
Understanding the Gap Between Stated and Revealed Preferences in News Curation: A Study of Young Adult Social Media Users

Do Won Kim, Cody Buntain, Giovanni Luca Ciampaglia

Social media feed algorithms infer user preferences from their past behaviors. Yet what drives engagement often diverges from what users value. We examine this gap between stated preferences (what users say they prefer) and revealed preferences (what their behavior suggests they prefer) among young adults, a group deeply embedded in algorithmically mediated environments. Using a mixed-methods approach combining surveys and interviews with feed curation activities, we investigate: what gaps exist between stated and revealed preferences; how users make sense of these gaps; what values users believe should guide algorithmic curation; and how systems might reflect those values. Participants often found themselves engaging with low-quality content they did not endorse, despite wanting high-quality information. When asked to curate an ideal social media news feed for a hypothetical persona, participants created feeds they considered more satisfying and higher in quality by prioritizing values such as accuracy and diversity. In doing so, they navigated trade-offs between different values, factoring in social relationships and context surrounding the persona. These findings suggest that feed curation is a socially situated process of judging what should be visible and appropriate in shared information spaces. Based on these insights, we offer design directions for bridging the gap between stated and revealed preferences.

75.4SIMay 19
Platform architecture determines whether recommendation algorithms can shape information quality on social media

Mohammad Hammas Saeed, David A. Broniatowski, Joseph Simons et al.

Social media platforms shape public discourse through two fundamental design choices that naturally co-occur in any field investigation: platform architecture, which defines what types of actors exist and how they interact, and recommendation algorithm, which determines what content is surfaced to users. Using agent-based simulation, we orthogonally manipulate both factors, exploring four prototypical architectures -- tree (e.g., Reddit), layered hierarchy (e.g., Facebook), network (e.g., Twitter), and complete graph (e.g., TikTok) -- and two algorithms: chronological (LIFO) and popularity-based (Hot). Drawing on prior theory that identifies and ranks canonical system architectures in terms of their flexibility we hypothesize that algorithmic effects on information spread and quality should be largest on the most flexible platforms and smallest on the most constrained ones. We find strong confirmation of this prediction. On tree-like platforms like Reddit, the algorithm has no detectable effect on information spread and quality. On layered hierarchies and networks like Facebook and Twitter, respectively, the Hot algorithm has modest positive effects on both the spread of information and its quality. On complete structures like TikTok, the Hot algorithm leads to a winner-take-all dynamics that has strong negative effects on both information spread and quality, making the relation between content quality and popularity unpredictable. These findings imply that architectural considerations are more powerful levers than algorithmic interventions for the design of healthy online spaces and public discourse. Platform reform efforts focused exclusively on algorithm choice may be insufficient on architecturally unconstrained platforms and unnecessary on architecturally constrained ones.

SIFeb 22, 2021
REMOD: Relation Extraction for Modeling Online Discourse

Matthew Sumpter, Giovanni Luca Ciampaglia

The enormous amount of discourse taking place online poses challenges to the functioning of a civil and informed public sphere. Efforts to standardize online discourse data, such as ClaimReview, are making available a wealth of new data about potentially inaccurate claims, reviewed by third-party fact-checkers. These data could help shed light on the nature of online discourse, the role of political elites in amplifying it, and its implications for the integrity of the online information ecosystem. Unfortunately, the semi-structured nature of much of this data presents significant challenges when it comes to modeling and reasoning about online discourse. A key challenge is relation extraction, which is the task of determining the semantic relationships between named entities in a claim. Here we develop a novel supervised learning method for relation extraction that combines graph embedding techniques with path traversal on semantic dependency graphs. Our approach is based on the intuitive observation that knowledge of the entities along the path between the subject and object of a triple (e.g. Washington,_D.C.}, and United_States_of_America) provides useful information that can be leveraged for extracting its semantic relation (i.e. capitalOf). As an example of a potential application of this technique for modeling online discourse, we show that our method can be integrated into a pipeline to reason about potential misinformation claims.

LGAug 15, 2019
HONEM: Learning Embedding for Higher Order Networks

Mandana Saebi, Giovanni Luca Ciampaglia, Lance M Kaplan et al.

Representation learning on networks offers a powerful alternative to the oft painstaking process of manual feature engineering, and as a result, has enjoyed considerable success in recent years. However, all the existing representation learning methods are based on the first-order network (FON), that is, the network that only captures the pairwise interactions between the nodes. As a result, these methods may fail to incorporate non-Markovian higher-order dependencies in the network. Thus, the embeddings that are generated may not accurately represent of the underlying phenomena in a network, resulting in inferior performance in different inductive or transductive learning tasks. To address this challenge, this paper presents HONEM, a higher-order network embedding method that captures the non-Markovian higher-order dependencies in a network. HONEM is specifically designed for the higher-order network structure (HON) and outperforms other state-of-the-art methods in node classification, network re-construction, link prediction, and visualization for networks that contain non-Markovian higher-order dependencies.

IRDec 22, 2017
RelSifter: Scoring Triples from Type-like Relations - The Samphire Triple Scorer at WSDM Cup 2017

Prashant Shiralkar, Mihai Avram, Giovanni Luca Ciampaglia et al.

We present RelSifter, a supervised learning approach to the problem of assigning relevance scores to triples expressing type-like relations such as 'profession' and 'nationality.' To provide additional contextual information about individuals and relations we supplement the data provided as part of the WSDM 2017 Triple Score contest with Wikidata and DBpedia, two large-scale knowledge graphs (KG). Our hypothesis is that any type relation, i.e., a specific profession like 'actor' or 'scientist,' can be described by the set of typical "activities" of people known to have that type relation. For example, actors are known to star in movies, and scientists are known for their academic affiliations. In a KG, this information is to be found on a properly defined subset of the second-degree neighbors of the type relation. This form of local information can be used as part of a learning algorithm to predict relevance scores for new, unseen triples. When scoring 'profession' and 'nationality' triples our experiments based on this approach result in an accuracy equal to 73% and 78%, respectively. These performance metrics are roughly equivalent or only slightly below the state of the art prior to the present contest. This suggests that our approach can be effective for evaluating facts, despite the skewness in the number of facts per individual mined from KGs.

AIAug 24, 2017
Finding Streams in Knowledge Graphs to Support Fact Checking

Prashant Shiralkar, Alessandro Flammini, Filippo Menczer et al.

The volume and velocity of information that gets generated online limits current journalistic practices to fact-check claims at the same rate. Computational approaches for fact checking may be the key to help mitigate the risks of massive misinformation spread. Such approaches can be designed to not only be scalable and effective at assessing veracity of dubious claims, but also to boost a human fact checker's productivity by surfacing relevant facts and patterns to aid their analysis. To this end, we present a novel, unsupervised network-flow based approach to determine the truthfulness of a statement of fact expressed in the form of a (subject, predicate, object) triple. We view a knowledge graph of background information about real-world entities as a flow network, and knowledge as a fluid, abstract commodity. We show that computational fact checking of such a triple then amounts to finding a "knowledge stream" that emanates from the subject node and flows toward the object node through paths connecting them. Evaluation on a range of real-world and hand-crafted datasets of facts related to entertainment, business, sports, geography and more reveals that this network-flow model can be very effective in discerning true statements from false ones, outperforming existing algorithms on many test cases. Moreover, the model is expressive in its ability to automatically discover several useful path patterns and surface relevant facts that may help a human fact checker corroborate or refute a claim.

SINov 20, 2016
Gendered Conversation in a Social Game-Streaming Platform

Supun Nakandala, Giovanni Luca Ciampaglia, Norman Makoto Su et al.

Online social media and games are increasingly replacing offline social activities. Social media is now an indispensable mode of communication; online gaming is not only a genuine social activity but also a popular spectator sport. With support for anonymity and larger audiences, online interaction shrinks social and geographical barriers. Despite such benefits, social disparities such as gender inequality persist in online social media. In particular, online gaming communities have been criticized for persistent gender disparities and objectification. As gaming evolves into a social platform, persistence of gender disparity is a pressing question. Yet, there are few large-scale, systematic studies of gender inequality and objectification in social gaming platforms. Here we analyze more than one billion chat messages from Twitch, a social game-streaming platform, to study how the gender of streamers is associated with the nature of conversation. Using a combination of computational text analysis methods, we show that gendered conversation and objectification is prevalent in chats. Female streamers receive significantly more objectifying comments while male streamers receive more game-related comments. This difference is more pronounced for popular streamers. There also exists a large number of users who post only on female or male streams. Employing a neural vector-space embedding (paragraph vector) method, we analyze gendered chat messages and create prediction models that (i) identify the gender of streamers based on messages posted in the channel and (ii) identify the gender a viewer prefers to watch based on their chat messages. Our findings suggest that disparities in social game-streaming platforms is a nuanced phenomenon that involves the gender of streamers as well as those who produce gendered and game-related conversation.

SIOct 20, 2016
Information Overload in Group Communication: From Conversation to Cacophony in the Twitch Chat

Azadeh Nematzadeh, Giovanni Luca Ciampaglia, Yong-Yeol Ahn et al.

Online communication channels, especially social web platforms, are rapidly replacing traditional ones. Online platforms allow users to overcome physical barriers, enabling worldwide participation. However, the power of online communication bears an important negative consequence --- we are exposed to too much information to process. Too many participants, for example, can turn online public spaces into noisy, overcrowded fora where no meaningful conversation can be held. Here we analyze a large dataset of public chat logs from Twitch, a popular video streaming platform, in order to examine how information overload affects online group communication. We measure structural and textual features of conversations such as user output, interaction, and information content per message across a wide range of information loads. Our analysis reveals the existence of a transition from a conversational state to a cacophony --- a state of overload with lower user participation, more copy-pasted messages, and less information per message. These results hold both on average and at the individual level for the majority of users. This study provides a quantitative basis for further studies of the social effects of information overload, and may guide the design of more resilient online communication systems.

SISep 4, 2014
MoodBar: Increasing new user retention in Wikipedia through lightweight socialization

Giovanni Luca Ciampaglia, Dario Taraborelli

Socialization in online communities allows existing members to welcome and recruit newcomers, introduce them to community norms and practices, and sustain their early participation. However, socializing newcomers does not come for free: in large communities, socialization can result in a significant workload for mentors and is hard to scale. In this study we present results from an experiment that measured the effect of a lightweight socialization tool on the activity and retention of newly registered users attempting to edit for the first time Wikipedia. Wikipedia is struggling with the retention of newcomers and our results indicate that a mechanism to elicit lightweight feedback and to provide early mentoring to newcomers improves their chances of becoming long-term contributors.