Sinan Aral

LG
h-index39
11papers
190citations
Novelty50%
AI Score51

11 Papers

95.8GNMay 29
Measuring Social Media Network Effects

Sinan Aral, Seth G Benzell, Avinash Collis et al.

Network effects -- the utility gains from additional consumers of a good -- are widely regarded as critical to the digital economy. Yet recent theory and evidence suggest that local network effects -- the economic value created by specific social network connections -- drive value in networked online platforms. Using incentive-compatible online choice experiments with 19,923 Facebook, Instagram, LinkedIn, and X users in the United States, we provide the first large-scale empirical measurement of local network effects in the digital economy and measure heterogeneity in connection value across platforms. Platform value ranges from \$78 to \$101 per consumer per month, with 8.1-23.7% explained by local network effects. We find that 1) stronger ties are more valuable on Facebook and Instagram, while weaker ties are more valuable on LinkedIn and X; 2) work connections are most valuable on LinkedIn and least on Facebook, and job-seekers value LinkedIn significantly more and Facebook significantly less; 3) men value connections to women significantly more than to other men, particularly on Instagram, Facebook, and X, while women value connections to men and women equally across platforms; 4) consumers value connections on any platform more if they are also connected on other platforms, suggesting that platforms are complements, not substitutes; 5) white consumers disproportionately value same-race connections on Facebook while, on Instagram, connections to alters eighteen or younger are valued significantly more than any other age group -- two patterns not seen on other platforms. Each platform generates between \$53B and \$215B in annual US consumer surplus. These results suggest that social media generates significant value, that local network effects drive a substantial fraction of it, and that the sources and contours of these effects vary across platforms, consumers, and connections.

97.2HCApr 13
Personality Pairing Improves Human-AI Collaboration

Harang Ju, Sinan Aral

Here we examine how AI agent "personalities" interact with human personalities to shape human-AI collaboration and performance. In a large-scale, preregistered randomized experiment, we paired 1,258 participants with AI agents prompted to exhibit varying levels of the Big Five personality traits. These human-AI teams produced 7,266 display ads for a real think tank, which we evaluated using 1,995 independent human raters and a field experiment on X that generated nearly 5 million impressions. We found that human and AI personalities individually shaped ad quality and teamwork. When examined together, human-AI personality pairings directly effected ad quality outcomes. For example, extraverted humans paired with conscientious AI produced the lowest-quality ads, followed by conscientious humans paired with agreeable AI and neurotic humans paired with conscientious AI. In the field experiment, ad quality significantly influenced ad performance, measured by click-through rates and cost-per-click, and neurotic humans paired with neurotic AI achieved higher click-through rates, even after controlling for ad quality. Together, these results provide the first large-scale causal experimental evidence that specific personality pairings can improve human-AI collaboration and motivate future research on the implications of AI personalization for performance and teamwork dynamics in human-AI teams.

13.0NIApr 14
Explaining Sustained Blockchain Decentralization with Quasi-Experiments: The Resource Flexibility of Consensus Mechanisms

Harang Ju, Madhav Kumar, Ehsan Valavi et al.

Decentralization is a fundamental design element of the Web3 economy. Blockchains and distributed consensus mechanisms are touted as fault-tolerant, attack-resistant, and collusion-proof because they are decentralized. Recent analyses, however, find some blockchains are decentralized, others are centralized, and that there are trends towards both centralization and decentralization in the blockchain economy. Despite the importance and variability of decentralization across blockchains, we still know little about what enables or constrains blockchain decentralization. We hypothesize that the resource flexibility of consensus mechanisms is a key enabler of the sustained decentralization of blockchain networks. We test this hypothesis using three quasi-experimental shocks -- policy-related, infrastructure-related, and technical -- to resources used in consensus. We find strong suggestive evidence that the resource flexibility of consensus mechanisms enables sustained blockchain decentralization and discuss the implications for the design, regulation, and implementation of blockchains.

88.4HCApr 3
The Augmentation Trap: AI Productivity and the Cost of Cognitive Offloading

Michael Caosun, Sinan Aral

Experimental evidence confirms that AI tools raise worker productivity, but also that sustained use can erode the expertise on which those gains depend. We develop a dynamic model in which a decision-maker chooses AI usage intensity for a worker over time, trading immediate productivity against the erosion of worker skill. We decompose the tool's productivity effect into two channels, one independent of worker expertise and one that scales with it. The model produces three main results. First, even a decision-maker who fully anticipates skill erosion rationally adopts AI when front-loaded productivity gains outweigh long-run skill costs, producing steady-state loss: the worker ends up less productive than before adoption. Second, when managers are short-termist or worker skill has external value, the decision-maker's optimal policy turns steady-state loss into the augmentation trap, leaving the worker worse off than if AI had never been adopted. Third, when AI productivity depends less on worker expertise, workers can permanently diverge in skill: experienced workers realize their full potential while less experienced workers deskill to zero. Small differences in managerial incentives can determine which path a worker takes. The productivity decomposition classifies deployments into five regimes that separate beneficial adoption from harmful adoption and identifies which deployments are vulnerable to the trap.

CYMar 23, 2025
Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance

Harang Ju, Sinan Aral

To uncover how AI agents change productivity, performance, and work processes, we introduce Pairit -- an experimentation platform enabling humans and AI agents to collaborate in integrative workspaces. In a large-scale marketing experiment on the platform, 2310 participants were randomly assigned to human-human and human-AI teams. The teams exchanged 183,691 messages and created 63,656 image edits, 1,960,095 ad copy edits, and 10,375 AI-generated images while producing 11,138 ads for a large think tank. Analysis of fine-grained communication, collaboration, and workflow logs revealed that collaborating with AI agents increased communication by 63% and allowed humans to engage in 71% less direct text editing. While human-AI teams engaged in 18% more process and content communication, human-human teams engaged in 29% more social and emotional communication. Humans in human-AI teams experienced 73% greater productivity per worker and produced higher-quality ad copy, while human-human teams produced higher-quality images, suggesting AI agents require fine-tuning for multimodal workflows. Field tests of the ad campaigns accumulated ~5M ad impressions and revealed that ads with higher image quality (produced by human-human collaborations) and higher text quality (produced by human-AI collaborations) performed significantly better on click-through rates, view through rates, and cost per click metrics. Together, these results suggest that human collaboration with AI agents significantly reshapes communication patterns and work processes and increases productivity, while improving some dimensions of output quality and deteriorating others. We hope the release of the extensible Pairit platform will accelerate RCTs of human-AI collaboration across a variety of work tasks and contexts.

AIMar 9, 2025
Advancing AI Negotiations: New Theory and Evidence from a Large-Scale Autonomous Negotiations Competition

Michelle Vaccaro, Michael Caosun, Harang Ju et al.

We conducted an International AI Negotiation Competition in which participants designed and refined prompts for AI negotiation agents. We then facilitated over 180,000 negotiations between these agents across multiple scenarios with diverse characteristics and objectives. Our findings revealed that principles from human negotiation theory remain crucial even in AI-AI contexts. Surprisingly, warmth--a traditionally human relationship-building trait--was consistently associated with superior outcomes across all key performance metrics. Dominant agents, meanwhile, were especially effective at claiming value. Our analysis also revealed unique dynamics in AI-AI negotiations not fully explained by existing theory, including AI-specific technical strategies like chain-of-thought reasoning, prompt injection, and strategic concealment. When we applied natural language processing (NLP) methods to the full transcripts of all negotiations we found positivity, gratitude and question-asking (associated with warmth) were strongly associated with reaching deals as well as objective and subjective value, whereas conversation lengths (associated with dominance) were strongly associated with impasses. The results suggest the need to establish a new theory of AI negotiation, which integrates classic negotiation theory with AI-specific negotiation theories to better understand autonomous negotiations and optimize agent performance.

AIMar 4, 2025
Teaching AI to Handle Exceptions: Supervised Fine-Tuning with Human-Aligned Judgment

Matthew DosSantos DiSorbo, Harang Ju, Sinan Aral

Large language models (LLMs), initially developed for generative AI, are now evolving into agentic AI systems, which make decisions in complex, real-world contexts. Unfortunately, while their generative capabilities are well-documented, their decision-making processes remain poorly understood. This is particularly evident when testing targeted decision-making: for instance, how models handle exceptions, a critical and challenging aspect of decision-making made relevant by the inherent incompleteness of contracts. Here we demonstrate that LLMs, even ones that excel at reasoning, deviate significantly from human judgments because they adhere strictly to policies, even when such adherence is impractical, suboptimal, or even counterproductive. We then evaluate three approaches to tuning AI agents to handle exceptions: ethical framework prompting, chain-of-thought reasoning, and supervised fine-tuning. We find that while ethical framework prompting fails and chain-of-thought prompting provides only slight improvements, supervised fine-tuning - specifically with human explanations - yields markedly better results. Surprisingly, in our experiments, supervised fine-tuning even enabled models to generalize human-like decision-making to novel scenarios, demonstrating transfer learning of human-aligned decision-making across contexts. Furthermore, fine-tuning with explanations, not just labels, was critical for alignment, suggesting that aligning LLMs with human judgment requires explicit training on how decisions are made, not just which decisions are made. These findings highlight the need to address LLMs' shortcomings in handling exceptions in order to guide the development of agentic AI toward models that can effectively align with human judgment and simultaneously adapt to novel contexts.

LGFeb 12, 2021
Modeling Dynamic User Interests: A Neural Matrix Factorization Approach

Paramveer Dhillon, Sinan Aral

In recent years, there has been significant interest in understanding users' online content consumption patterns. But, the unstructured, high-dimensional, and dynamic nature of such data makes extracting valuable insights challenging. Here we propose a model that combines the simplicity of matrix factorization with the flexibility of neural networks to efficiently extract nonlinear patterns from massive text data collections relevant to consumers' online consumption patterns. Our model decomposes a user's content consumption journey into nonlinear user and content factors that are used to model their dynamic interests. This natural decomposition allows us to summarize each user's content consumption journey with a dynamic probabilistic weighting over a set of underlying content attributes. The model is fast to estimate, easy to interpret and can harness external data sources as an empirical prior. These advantages make our method well suited to the challenges posed by modern datasets. We use our model to understand the dynamic news consumption interests of Boston Globe readers over five years. Thorough qualitative studies, including a crowdsourced evaluation, highlight our model's ability to accurately identify nuanced and coherent consumption patterns. These results are supported by our model's superior and robust predictive performance over several competitive baseline methods.

LGOct 29, 2020
Targeting for long-term outcomes

Jeremy Yang, Dean Eckles, Paramveer Dhillon et al.

Decision makers often want to target interventions so as to maximize an outcome that is observed only in the long-term. This typically requires delaying decisions until the outcome is observed or relying on simple short-term proxies for the long-term outcome. Here we build on the statistical surrogacy and policy learning literatures to impute the missing long-term outcomes and then approximate the optimal targeting policy on the imputed outcomes via a doubly-robust approach. We first show that conditions for the validity of average treatment effect estimation with imputed outcomes are also sufficient for valid policy evaluation and optimization; furthermore, these conditions can be somewhat relaxed for policy optimization. We apply our approach in two large-scale proactive churn management experiments at The Boston Globe by targeting optimal discounts to its digital subscribers with the aim of maximizing long-term revenue. Using the first experiment, we evaluate this approach empirically by comparing the policy learned using imputed outcomes with a policy learned on the ground-truth, long-term outcomes. The performance of these two policies is statistically indistinguishable, and we rule out large losses from relying on surrogates. Our approach also outperforms a policy learned on short-term proxies for the long-term outcome. In a second field experiment, we implement the optimal targeting policy with additional randomized exploration, which allows us to update the optimal policy for future subscribers. Over three years, our approach had a net-positive revenue impact in the range of $4-5 million compared to the status quo.

SIMar 17, 2020
The Engagement-Diversity Connection: Evidence from a Field Experiment on Spotify

David Holtz, Benjamin Carterette, Praveen Chandar et al.

It remains unknown whether personalized recommendations increase or decrease the diversity of content people consume. We present results from a randomized field experiment on Spotify testing the effect of personalized recommendations on consumption diversity. In the experiment, both control and treatment users were given podcast recommendations, with the sole aim of increasing podcast consumption. Treatment users' recommendations were personalized based on their music listening history, whereas control users were recommended popular podcasts among users in their demographic group. We find that, on average, the treatment increased podcast streams by 28.90%. However, the treatment also decreased the average individual-level diversity of podcast streams by 11.51%, and increased the aggregate diversity of podcast streams by 5.96%, indicating that personalized recommendations have the potential to create patterns of consumption that are homogenous within and diverse across users, a pattern reflecting Balkanization. Our results provide evidence of an "engagement-diversity trade-off" when recommendations are optimized solely to drive consumption: while personalized recommendations increase user engagement, they also affect the diversity of consumed content. This shift in consumption diversity can affect user retention and lifetime value, and impact the optimal strategy for content producers. We also observe evidence that our treatment affected streams from sections of Spotify's app not directly affected by the experiment, suggesting that exposure to personalized recommendations can affect the content that users consume organically. We believe these findings highlight the need for academics and practitioners to continue investing in personalization methods that explicitly take into account the diversity of content recommended.

LGJan 31, 2020
Scalable bundling via dense product embeddings

Madhav Kumar, Dean Eckles, Sinan Aral

Bundling, the practice of jointly selling two or more products at a discount, is a widely used strategy in industry and a well examined concept in academia. Historically, the focus has been on theoretical studies in the context of monopolistic firms and assumed product relationships, e.g., complementarity in usage. We develop a new machine-learning-driven methodology for designing bundles in a large-scale, cross-category retail setting. We leverage historical purchases and consideration sets created from clickstream data to generate dense continuous representations of products called embeddings. We then put minimal structure on these embeddings and develop heuristics for complementarity and substitutability among products. Subsequently, we use the heuristics to create multiple bundles for each product and test their performance using a field experiment with a large retailer. We combine the results from the experiment with product embeddings using a hierarchical model that maps bundle features to their purchase likelihood, as measured by the add-to-cart rate. We find that our embeddings-based heuristics are strong predictors of bundle success, robust across product categories, and generalize well to the retailer's entire assortment.