Harang Ju

AI
h-index39
8papers
34citations
Novelty49%
AI Score52

8 Papers

98.1HCApr 13
Personality Pairing Improves Human-AI Collaboration

Harang Ju, Sinan Aral

Here we examine how AI agent "personalities" interact with human personalities to shape human-AI collaboration and performance. In a large-scale, preregistered randomized experiment, we paired 1,258 participants with AI agents prompted to exhibit varying levels of the Big Five personality traits. These human-AI teams produced 7,266 display ads for a real think tank, which we evaluated using 1,995 independent human raters and a field experiment on X that generated nearly 5 million impressions. We found that human and AI personalities individually shaped ad quality and teamwork. When examined together, human-AI personality pairings directly effected ad quality outcomes. For example, extraverted humans paired with conscientious AI produced the lowest-quality ads, followed by conscientious humans paired with agreeable AI and neurotic humans paired with conscientious AI. In the field experiment, ad quality significantly influenced ad performance, measured by click-through rates and cost-per-click, and neurotic humans paired with neurotic AI achieved higher click-through rates, even after controlling for ad quality. Together, these results provide the first large-scale causal experimental evidence that specific personality pairings can improve human-AI collaboration and motivate future research on the implications of AI personalization for performance and teamwork dynamics in human-AI teams.

13.8NIApr 14
Explaining Sustained Blockchain Decentralization with Quasi-Experiments: The Resource Flexibility of Consensus Mechanisms

Harang Ju, Madhav Kumar, Ehsan Valavi et al.

Decentralization is a fundamental design element of the Web3 economy. Blockchains and distributed consensus mechanisms are touted as fault-tolerant, attack-resistant, and collusion-proof because they are decentralized. Recent analyses, however, find some blockchains are decentralized, others are centralized, and that there are trends towards both centralization and decentralization in the blockchain economy. Despite the importance and variability of decentralization across blockchains, we still know little about what enables or constrains blockchain decentralization. We hypothesize that the resource flexibility of consensus mechanisms is a key enabler of the sustained decentralization of blockchain networks. We test this hypothesis using three quasi-experimental shocks -- policy-related, infrastructure-related, and technical -- to resources used in consensus. We find strong suggestive evidence that the resource flexibility of consensus mechanisms enables sustained blockchain decentralization and discuss the implications for the design, regulation, and implementation of blockchains.

49.6MAApr 18
When Coordination Is Avoidable: A Monotonicity Analysis of Organizational Tasks

Harang Ju

Organizations devote substantial resources to coordination, yet which tasks actually require it for correctness remains unclear. The problem is acute in multi-agent AI systems, where coordination cost is directly measurable and can exceed the cost of the work itself. Distributed systems theory provides a precise criterion: coordination is required when a task specification is non-monotonic, meaning that as histories grow, new information can invalidate prior conclusions. Here we show that Thompson's classic taxonomy of interdependence maps to that criterion, yielding a decision rule for when coordination is required for correctness. We formalize the correspondence in a bridge theorem, apply the rule to 65 APQC workflows and (with a calibrated LLM) 13,417 O*NET tasks, and illustrate it in multi-agent AI simulations. Under our decompositions, 74% of workflows and 42% of O*NET tasks are monotonic, implying that up to 24-57% of coordination spending is unnecessary for correctness.

52.3LGMar 31
Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models

Matthew DosSantos DiSorbo, Harang Ju

Effective automation hinges on deciding when to act and when to escalate. We model this as a decision under uncertainty: an LLM forms a prediction, estimates its probability of being correct, and compares the expected costs of acting and escalating. Using this framework across five domains of recorded human decisions-demand forecasting, content recommendation, content moderation, loan approval, and autonomous driving-and across multiple model families, we find marked differences in the implicit thresholds models use to trade off these costs. These thresholds vary substantially and are not predicted by architecture or scale, while self-estimates are miscalibrated in model-specific ways. We then test interventions that target this decision process by varying cost ratios, providing accuracy signals, and training models to follow the desired escalation rule. Prompting helps mainly for reasoning models. SFT on chain-of-thought targets yields the most robust policies, which generalize across datasets, cost ratios, prompt framings, and held-out domains. These results suggest that escalation behavior is a model-specific property that should be characterized before deployment, and that robust alignment benefits from training models to reason explicitly about uncertainty and decision costs.

60.6LGMay 7
SMolLM: Small Language Models Learn Small Molecular Grammar

Akhil Jindal, Harang Ju

Language models for molecular design have scaled to hundreds of millions of parameters, yet how they learn chemical grammar is poorly understood. We train SMolLM, a 53K-parameter weight-shared transformer, to generate novel SMILES with 95% validity on the ZINC-250K drug-like-molecule benchmark, outperforming a standard GPT with 10 times more parameters. Mechanistically, the same block resolves SMILES constraints across passes in a fixed order: brackets first, rings second, and valence last, as shown by error classification, linear probing, and sparse autoencoders. A systematic ablation across attention heads and passes further localizes the first bracket-matching step to a single attention head. Together, these results yield a compact, mechanistically interpretable molecular generator and a testbed for studying iterative computation in formal-language domains.

CYMar 23, 2025
Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance

Harang Ju, Sinan Aral

To uncover how AI agents change productivity, performance, and work processes, we introduce Pairit -- an experimentation platform enabling humans and AI agents to collaborate in integrative workspaces. In a large-scale marketing experiment on the platform, 2310 participants were randomly assigned to human-human and human-AI teams. The teams exchanged 183,691 messages and created 63,656 image edits, 1,960,095 ad copy edits, and 10,375 AI-generated images while producing 11,138 ads for a large think tank. Analysis of fine-grained communication, collaboration, and workflow logs revealed that collaborating with AI agents increased communication by 63% and allowed humans to engage in 71% less direct text editing. While human-AI teams engaged in 18% more process and content communication, human-human teams engaged in 29% more social and emotional communication. Humans in human-AI teams experienced 73% greater productivity per worker and produced higher-quality ad copy, while human-human teams produced higher-quality images, suggesting AI agents require fine-tuning for multimodal workflows. Field tests of the ad campaigns accumulated ~5M ad impressions and revealed that ads with higher image quality (produced by human-human collaborations) and higher text quality (produced by human-AI collaborations) performed significantly better on click-through rates, view through rates, and cost per click metrics. Together, these results suggest that human collaboration with AI agents significantly reshapes communication patterns and work processes and increases productivity, while improving some dimensions of output quality and deteriorating others. We hope the release of the extensible Pairit platform will accelerate RCTs of human-AI collaboration across a variety of work tasks and contexts.

AIMar 9, 2025
Advancing AI Negotiations: New Theory and Evidence from a Large-Scale Autonomous Negotiations Competition

Michelle Vaccaro, Michael Caosun, Harang Ju et al.

We conducted an International AI Negotiation Competition in which participants designed and refined prompts for AI negotiation agents. We then facilitated over 180,000 negotiations between these agents across multiple scenarios with diverse characteristics and objectives. Our findings revealed that principles from human negotiation theory remain crucial even in AI-AI contexts. Surprisingly, warmth--a traditionally human relationship-building trait--was consistently associated with superior outcomes across all key performance metrics. Dominant agents, meanwhile, were especially effective at claiming value. Our analysis also revealed unique dynamics in AI-AI negotiations not fully explained by existing theory, including AI-specific technical strategies like chain-of-thought reasoning, prompt injection, and strategic concealment. When we applied natural language processing (NLP) methods to the full transcripts of all negotiations we found positivity, gratitude and question-asking (associated with warmth) were strongly associated with reaching deals as well as objective and subjective value, whereas conversation lengths (associated with dominance) were strongly associated with impasses. The results suggest the need to establish a new theory of AI negotiation, which integrates classic negotiation theory with AI-specific negotiation theories to better understand autonomous negotiations and optimize agent performance.

AIMar 4, 2025
Teaching AI to Handle Exceptions: Supervised Fine-Tuning with Human-Aligned Judgment

Matthew DosSantos DiSorbo, Harang Ju, Sinan Aral

Large language models (LLMs), initially developed for generative AI, are now evolving into agentic AI systems, which make decisions in complex, real-world contexts. Unfortunately, while their generative capabilities are well-documented, their decision-making processes remain poorly understood. This is particularly evident when testing targeted decision-making: for instance, how models handle exceptions, a critical and challenging aspect of decision-making made relevant by the inherent incompleteness of contracts. Here we demonstrate that LLMs, even ones that excel at reasoning, deviate significantly from human judgments because they adhere strictly to policies, even when such adherence is impractical, suboptimal, or even counterproductive. We then evaluate three approaches to tuning AI agents to handle exceptions: ethical framework prompting, chain-of-thought reasoning, and supervised fine-tuning. We find that while ethical framework prompting fails and chain-of-thought prompting provides only slight improvements, supervised fine-tuning - specifically with human explanations - yields markedly better results. Surprisingly, in our experiments, supervised fine-tuning even enabled models to generalize human-like decision-making to novel scenarios, demonstrating transfer learning of human-aligned decision-making across contexts. Furthermore, fine-tuning with explanations, not just labels, was critical for alignment, suggesting that aligning LLMs with human judgment requires explicit training on how decisions are made, not just which decisions are made. These findings highlight the need to address LLMs' shortcomings in handling exceptions in order to guide the development of agentic AI toward models that can effectively align with human judgment and simultaneously adapt to novel contexts.