AIFeb 23
Agents of ChaosNatalie Shapira, Chris Wendler, Avery Yen et al.
We report an exploratory red-teaming study of autonomous language-model-powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions. Focusing on failures emerging from the integration of language models with autonomy, tool use, and multi-party communication, we document eleven representative case studies. Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. In several cases, agents reported task completion while the underlying system state contradicted those reports. We also report on some of the failed attempts. Our findings establish the existence of security-, privacy-, and governance-relevant vulnerabilities in realistic deployment settings. These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms, and warrant urgent attention from legal scholars, policymakers, and researchers across disciplines. This report serves as an initial empirical contribution to that broader conversation.
HCJul 3, 2024
AI's Social Forcefield: Reshaping Distributed Cognition in Human-AI TeamsChristoph Riedl, Saiph Savage, Josie Zvelebilova
AI is not only a neutral tool in team settings; it actively reshapes the social and cognitive fabric of collaboration. We advance a unified framework of alignment in distributed cognition in human-AI teams -- a process through which linguistic, cognitive, and social coordination emerge as human and AI agents co-construct a shared representational space. Across two studies, we show that exposure to AI-generated language shapes not only how people speak, but also how they think, what they attend to, and how they relate to each other. Together, these findings reveal how AI participation reorganizes the distributed cognitive architecture of teams: AI systems function as implicit social forcefields. Our findings highlight the double-edged impact of AI: the same mechanisms that enable efficient collaboration can also erode epistemic diversity and undermine natural alignment processes. We argue for rethinking AI in teams as a socially influential actor and call for new design paradigms that foreground transparency, controllability, and group-level dynamics to foster responsible, productive human-AI collaboration.
70.1HCApr 21
Personalized AI Scaffolds Synergistic Multi-Turn Collaboration in Creative WorkSean Kelley, David De Cremer, Christoph Riedl
As AI becomes more deeply embedded in knowledge work, building assistants that support human creativity and expertise becomes more important. Yet achieving synergy in human-AI collaboration is not easy. Providing AI with detailed information about a user's demographics, psychological attributes, divergent thinking, and domain expertise may improve performance by scaffolding more effective multi-turn interactions. We implemented a personalized LLM-based assistant, informed by users' psychometric profiles and an AI-guided interview about their work style, to help users complete a marketing task for a fictional startup. We randomized 331 participants to work with AI that was either generic (n = 116), partially personalized (n = 114), or fully personalized (n=101). Participants working with personalized AI produce marketing campaigns of significantly higher quality and creativity, beyond what AI alone could have produced. Compared to generic AI, personalized AI leads to higher self-reported levels of assistance and feedback, while also increasing participant trust and confidence. Causal mediation analysis shows that personalization improves performance indirectly by enhancing collective memory, attention, and reasoning in the human-AI interaction. These findings provide a theory-driven framework in which personalization functions as external scaffolding that builds common ground and shared partner models, reducing uncertainty and enhancing joint cognition. This informs the design of future AI assistants that maximize synergy and support human creative potential while limiting negative homogenization.
GNSep 27, 2024
Effects of AI Feedback on Learning, the Skill Gap, and Intellectual DiversityChristoph Riedl, Eric Bogert
Can human decision-makers learn from AI feedback? Using data on 52,000 decision-makers from a large online chess platform, we investigate how their AI use affects three interrelated long-term outcomes: Learning, skill gap, and diversity of decision strategies. First, we show that individuals are far more likely to seek AI feedback in situations in which they experienced success rather than failure. This AI feedback seeking strategy turns out to be detrimental to learning: Feedback on successes decreases future performance, while feedback on failures increases it. Second, higher-skilled decision-makers seek AI feedback more often and are far more likely to seek AI feedback after a failure, and benefit more from AI feedback than lower-skilled individuals. As a result, access to AI feedback increases, rather than decreases, the skill gap between high- and low-skilled individuals. Finally, we leverage 42 major platform updates as natural experiments to show that access to AI feedback causes a decrease in intellectual diversity of the population as individuals tend to specialize in the same areas. Together, those results indicate that learning from AI feedback is not automatic and using AI correctly seems to be a skill itself. Furthermore, despite its individual-level benefits, access to AI feedback can have significant population-level downsides including loss of intellectual diversity and an increasing skill gap.
CLMay 20, 2025
Language Models use Lookbacks to Track BeliefsNikhil Prakash, Natalie Shapira, Arnab Sen Sharma et al.
How do language models (LMs) represent characters' beliefs, especially when those beliefs may differ from reality? This question lies at the heart of understanding the Theory of Mind (ToM) capabilities of LMs. We analyze LMs' ability to reason about characters' beliefs using causal mediation and abstraction. We construct a dataset, CausalToM, consisting of simple stories where two characters independently change the state of two objects, potentially unaware of each other's actions. Our investigation uncovers a pervasive algorithmic pattern that we call a lookback mechanism, which enables the LM to recall important information when it becomes necessary. The LM binds each character-object-state triple together by co-locating their reference information, represented as Ordering IDs (OIs), in low-rank subspaces of the state token's residual stream. When asked about a character's beliefs regarding the state of an object, the binding lookback retrieves the correct state OI and then the answer lookback retrieves the corresponding state token. When we introduce text specifying that one character is (not) visible to the other, we find that the LM first generates a visibility ID encoding the relation between the observing and the observed character OIs. In a visibility lookback, this ID is used to retrieve information about the observed character and update the observing character's beliefs. Our work provides insights into belief tracking mechanisms, taking a step toward reverse-engineering ToM reasoning in LMs.
MAOct 5, 2025
Emergent Coordination in Multi-Agent Language ModelsChristoph Riedl
When are multi-agent LLM systems merely a collection of individual agents versus an integrated collective with higher-order structure? We introduce an information-theoretic framework to test -- in a purely data-driven way -- whether multi-agent systems show signs of higher-order structure. This information decomposition lets us measure whether dynamical emergence is present in multi-agent LLM systems, localize it, and distinguish spurious temporal coupling from performance-relevant cross-agent synergy. We implement both a practical criterion and an emergence capacity criterion operationalized as partial information decomposition of time-delayed mutual information (TDMI). We apply our framework to experiments using a simple guessing game without direct agent communication and only minimal group-level feedback with three randomized interventions. Groups in the control condition exhibit strong temporal synergy but only little coordinated alignment across agents. Assigning a persona to each agent introduces stable identity-linked differentiation. Combining personas with an instruction to ``think about what other agents might do'' shows identity-linked differentiation and goal-directed complementarity across agents. Taken together, our framework establishes that multi-agent LLM systems can be steered with prompt design from mere aggregates to higher-order collectives. Our results are robust across emergence measures and entropy estimators, and not explained by coordination-free baselines or temporal dynamics alone. Without attributing human-like cognition to the agents, the patterns of interaction we observe mirror well-established principles of collective intelligence in human groups: effective performance requires both alignment on shared objectives and complementary contributions across members.
CLJun 16, 2024
Large Language Models for Automatic Milestone Detection in Group DiscussionsZhuoxu Duan, Zhengye Yang, Samuel Westby et al.
Large language models like GPT have proven widely successful on natural language understanding tasks based on written text documents. In this paper, we investigate an LLM's performance on recordings of a group oral communication task in which utterances are often truncated or not well-formed. We propose a new group task experiment involving a puzzle with several milestones that can be achieved in any order. We investigate methods for processing transcripts to detect if, when, and by whom a milestone has been completed. We demonstrate that iteratively prompting GPT with transcription chunks outperforms semantic similarity search methods using text embeddings, and further discuss the quality and randomness of GPT responses under different context window sizes.
APJul 10, 2018
Optimal design of experiments to identify latent behavioral typesStefano Balietti, Brennan Klein, Christoph Riedl
Bayesian optimal experiments that maximize the information gained from collected data are critical to efficiently identify behavioral models. We extend a seminal method for designing Bayesian optimal experiments by introducing two computational improvements that make the procedure tractable: (1) a search algorithm from artificial intelligence that efficiently explores the space of possible design parameters, and (2) a sampling procedure which evaluates each design parameter combination more efficiently. We apply our procedure to a game of imperfect information to evaluate and quantify the computational improvements. We then collect data across five different experimental designs to compare the ability of the optimal experimental design to discriminate among competing behavioral models against the experimental designs chosen by a "wisdom of experts" prediction experiment. We find that data from the experiment suggested by the optimal design approach requires significantly less data to distinguish behavioral models (i.e., test hypotheses) than data from the experiment suggested by experts. Substantively, we find that reinforcement learning best explains human decision-making in the imperfect information game and that behavior is not adequately described by the Bayesian Nash equilibrium. Our procedure is general and computationally efficient and can be applied to dynamically optimize online experiments.
CVOct 24, 2014
Detecting Figures and Part Labels in Patents: Competition-Based Development of Image Processing AlgorithmsChristoph Riedl, Richard Zanibbi, Marti A. Hearst et al.
We report the findings of a month-long online competition in which participants developed algorithms for augmenting the digital version of patent documents published by the United States Patent and Trademark Office (USPTO). The goal was to detect figures and part labels in U.S. patent drawing pages. The challenge drew 232 teams of two, of which 70 teams (30%) submitted solutions. Collectively, teams submitted 1,797 solutions that were compiled on the competition servers. Participants reported spending an average of 63 hours developing their solutions, resulting in a total of 5,591 hours of development time. A manually labeled dataset of 306 patents was used for training, online system tests, and evaluation. The design and performance of the top-5 systems are presented, along with a system developed after the competition which illustrates that winning teams produced near state-of-the-art results under strict time and computation constraints. For the 1st place system, the harmonic mean of recall and precision (f-measure) was 88.57% for figure region detection, 78.81% for figure regions with correctly recognized figure titles, and 70.98% for part label detection and character recognition. Data and software from the competition are available through the online UCI Machine Learning repository to inspire follow-on work by the image processing community.