AIJan 5, 2023Code
Reasoning about Causality in GamesLewis Hammond, James Fox, Tom Everitt et al. · deepmind
Causal reasoning and game-theoretic reasoning are fundamental topics in artificial intelligence, among many other disciplines: this paper is concerned with their intersection. Despite their importance, a formal framework that supports both these forms of reasoning has, until now, been lacking. We offer a solution in the form of (structural) causal games, which can be seen as extending Pearl's causal hierarchy to the game-theoretic domain, or as extending Koller and Milch's multi-agent influence diagrams to the causal domain. We then consider three key questions: i) How can the (causal) dependencies in games - either between variables, or between strategies - be modelled in a uniform, principled manner? ii) How may causal queries be computed in causal games, and what assumptions does this require? iii) How do causal games compare to existing formalisms? To address question i), we introduce mechanised games, which encode dependencies between agents' decision rules and the distributions governing the game. In response to question ii), we present definitions of predictions, interventions, and counterfactuals, and discuss the assumptions required for each. Regarding question iii), we describe correspondences between causal games and other formalisms, and explain how causal games can be used to answer queries that other causal or game-theoretic models do not support. Finally, we highlight possible applications of causal games, aided by an extensive open-source Python library.
MAOct 13, 2023Code
Welfare Diplomacy: Benchmarking Language Model CooperationGabriel Mukobi, Hannah Erlebach, Niklas Lauffer et al.
The growing capabilities and increasingly widespread deployment of AI systems necessitate robust benchmarks for measuring their cooperative capabilities. Unfortunately, most multi-agent benchmarks are either zero-sum or purely cooperative, providing limited opportunities for such measurements. We introduce a general-sum variant of the zero-sum board game Diplomacy -- called Welfare Diplomacy -- in which players must balance investing in military conquest and domestic welfare. We argue that Welfare Diplomacy facilitates both a clearer assessment of and stronger training incentives for cooperative capabilities. Our contributions are: (1) proposing the Welfare Diplomacy rules and implementing them via an open-source Diplomacy engine; (2) constructing baseline agents using zero-shot prompted language models; and (3) conducting experiments where we find that baselines using state-of-the-art models attain high social welfare but are exploitable. Our work aims to promote societal safety by aiding researchers in developing and assessing multi-agent AI systems. Code to evaluate Welfare Diplomacy and reproduce our experiments is available at https://github.com/mukobi/welfare-diplomacy.
47.2CYJun 1
Legal Alignment for Safe and Ethical AINoam Kolt, Nicholas Caputo, Jack Boeglin et al.
Alignment of artificial intelligence (AI) encompasses the normative problem of specifying how AI systems should act and the technical problem of ensuring AI systems comply with those specifications. To date, AI alignment has generally overlooked an important source of knowledge and practice for grappling with these problems: law. In this paper, we survey the emerging field of legal alignment that aims to fill this gap and systematize research that studies how legal rules, principles, and methods can be leveraged to address problems of alignment and inform the design of AI systems that operate safely and ethically. Our survey provides a taxonomy of the three core research pathways of legal alignment and explores how each can be operationalized in practice: (1) designing AI systems to comply with the content of legal rules developed through legitimate institutions and processes, (2) adapting methods from legal interpretation to guide how AI systems reason and make decisions, and (3) harnessing legal concepts as a structural blueprint for confronting challenges of reliability, trust, and cooperation in AI systems. These research pathways present new conceptual, empirical, and institutional questions, which include examining the specific set of laws that particular AI systems should follow, creating evaluations to assess their legal compliance in real-world settings, and developing governance frameworks to support the implementation of legal alignment in practice. Tackling these questions requires expertise across law, computer science, and other disciplines, offering these communities the opportunity to collaborate in designing AI for the better.
LGDec 28, 2022
Lexicographic Multi-Objective Reinforcement LearningJoar Skalse, Lewis Hammond, Charlie Griffin et al.
In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward signal, and subject to this constraint also maximises the second reward signal, and so on. We present a family of both action-value and policy gradient algorithms that can be used to solve such problems, and prove that they converge to policies that are lexicographically optimal. We evaluate the scalability and performance of these algorithms empirically, demonstrating their practical applicability. As a more specific application, we show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.
AIDec 3, 2025
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using ConcordiaChandler Smith, Marwa Abdulhai, Manfred Diaz et al.
Large Language Model (LLM) agents have demonstrated impressive capabilities for social interaction and are increasingly being deployed in situations where they might engage with both human and artificial agents. These interactions represent a critical frontier for LLM-based agents, yet existing evaluation methods fail to measure how well these capabilities generalize to novel social situations. In this paper, we introduce a method for evaluating the ability of LLM-based agents to cooperate in zero-shot, mixed-motive environments using Concordia, a natural language multi-agent simulation environment. Our method measures general cooperative intelligence by testing an agent's ability to identify and exploit opportunities for mutual gain across diverse partners and contexts. We present empirical results from the NeurIPS 2024 Concordia Contest, where agents were evaluated on their ability to achieve mutual gains across a suite of diverse scenarios ranging from negotiation to collective action problems. Our findings reveal significant gaps between current agent capabilities and the robust generalization required for reliable cooperation, particularly in scenarios demanding persuasion and norm enforcement.
GTJul 11, 2023
On Imperfect Recall in Multi-Agent Influence DiagramsJames Fox, Matt MacDermott, Lewis Hammond et al.
Multi-agent influence diagrams (MAIDs) are a popular game-theoretic model based on Bayesian networks. In some settings, MAIDs offer significant advantages over extensive-form game representations. Previous work on MAIDs has assumed that agents employ behavioural policies, which set independent conditional probability distributions over actions for each of their decisions. In settings with imperfect recall, however, a Nash equilibrium in behavioural policies may not exist. We overcome this by showing how to solve MAIDs with forgetful and absent-minded agents using mixed policies and two types of correlated equilibrium. We also analyse the computational complexity of key decision problems in MAIDs, and explore tractable cases. Finally, we describe applications of MAIDs to Markov games and team situations, where imperfect recall is often unavoidable.
19.6AIApr 30
Causal Foundations of Collective AgencyFrederik Hytting Jørgensen, Sebastian Weichwald, Lewis Hammond
A key challenge for the safety of advanced AI systems is the possibility that multiple simpler agents might inadvertently form a collective agent with capabilities and goals distinct from those of any individual. More generally, determining when a group of agents can be viewed as a unified collective agent is a foundational question in the study of interactions and incentives in both biological and artificial systems. We adopt a behavioral perspective in answering this question, ascribing collective agency to a group when viewing the group's joint actions as rational and goal-directed successfully predicts its behavior. We formalize this perspective on collective agency using causal games -- which are causal models of strategic, multi-agent interactions -- and causal abstraction -- which formalizes when a simple, high-level model faithfully captures a more complex, low-level model. We use this framework to solve a puzzle regarding multi-agent incentives in actor-critic models and to make quantitative assessments of the degree of collective agency exhibited by different voting mechanisms. Our framework aims to provide a foundation for theoretical and empirical work to understand, predict, and control emergent collective agents in multi-agent AI systems.
LGSep 30, 2022
Bounded Robustness in Reinforcement Learning via Lexicographic ObjectivesDaniel Jarne Ornia, Licio Romao, Lewis Hammond et al.
Policy robustness in Reinforcement Learning may not be desirable at any cost: the alterations caused by robustness requirements from otherwise optimal policies should be explainable, quantifiable and formally verifiable. In this work we study how policies can be maximally robust to arbitrary observational noise by analysing how they are altered by this noise through a stochastic linear operator interpretation of the disturbances, and establish connections between robustness and properties of the noise kernel and of the underlying MDPs. Then, we construct sufficient conditions for policy robustness, and propose a robustness-inducing scheme, applicable to any policy gradient algorithm, that formally trades off expected policy utility for robustness through lexicographic optimisation, while preserving convergence and sub-optimality in the policy synthesis.
20.2CYMay 23
Habermolt: Delegating Deliberation to AI RepresentativesJoseph Low, Oscar Duys, Claude Formanek et al.
Deliberative democracy arguably leads to better collective decisions, but is fundamentally constrained by human attention and bandwidth. While recent AI-mediated deliberations scale participation by synthesizing inputs from many humans, they remain time-intensive for individual users. As AI models become increasingly capable, AI systems are being deployed not only to mediate deliberation between humans, but to represent humans in it: where AI agents deliberate on behalf of human users. We call this paradigm AI-delegated deliberation. While it promises unprecedented scale for democratic participation, it introduces qualitatively new design and alignment challenges that are poorly understood and under-theorized. To study these dynamics empirically, we deploy Habermolt, a public platform for AI-delegated deliberation. We evaluate its effectiveness along three dimensions that we use to organize any deliberative system: representation, aggregation, and revision. We use these observations to illuminate the design decisions future AI-delegated deliberation platforms must confront, contributing to the broader research agenda for scalable yet trustworthy AI representatives.
LGApr 15, 2024
Foundational Challenges in Assuring Alignment and Safety of Large Language ModelsUsman Anwar, Abulhair Saparov, Javier Rando et al. · cambridge, eth-zurich
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.
MAFeb 9, 2021Code
Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and PracticeLewis Hammond, James Fox, Tom Everitt et al.
Multi-agent influence diagrams (MAIDs) are a popular form of graphical model that, for certain classes of games, have been shown to offer key complexity and explainability advantages over traditional extensive form game (EFG) representations. In this paper, we extend previous work on MAIDs by introducing the concept of a MAID subgame, as well as subgame perfect and trembling hand perfect equilibrium refinements. We then prove several equivalence results between MAIDs and EFGs. Finally, we describe an open source implementation for reasoning about MAIDs and computing their equilibria.
CYJan 23, 2024
Visibility into AI AgentsAlan Chan, Carson Ezell, Max Kaufmann et al. · cambridge
Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks. Understanding and mitigating these risks involves critically evaluating existing governance structures, revising and adapting these structures where needed, and ensuring accountability of key stakeholders. Information about where, why, how, and by whom certain AI agents are used, which we refer to as visibility, is critical to these objectives. In this paper, we assess three categories of measures to increase visibility into AI agents: agent identifiers, real-time monitoring, and activity logging. For each, we outline potential implementations that vary in intrusiveness and informativeness. We analyze how the measures apply across a spectrum of centralized through decentralized deployment contexts, accounting for various actors in the supply chain including hardware and software service providers. Finally, we discuss the implications of our measures for privacy and concentration of power. Further work into understanding the measures and mitigating their negative impacts can help to build a foundation for the governance of AI agents.
MAFeb 19, 2025
Multi-Agent Risks from Advanced AILewis Hammond, Alan Chan, Jesse Clifton et al. · stanford
The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, as well as seven key risk factors (information asymmetries, network effects, selection pressures, destabilising dynamics, commitment problems, emergent agency, and multi-agent security) that can underpin them. We highlight several important instances of each risk, as well as promising directions to help mitigate them. By anchoring our analysis in a range of real-world examples and experimental evidence, we illustrate the distinct challenges posed by multi-agent systems and their implications for the safety, governance, and ethics of advanced AI.
AIFeb 12, 2024
Secret Collusion among AI Agents: Multi-Agent Deception via SteganographySumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier et al.
Recent capability increases in large language models (LLMs) open up applications in which groups of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both AI and security literature. We study incentives for the use of steganography, and propose a variety of mitigation measures. Our investigations result in a model evaluation framework that systematically tests capabilities required for various forms of secret collusion. We provide extensive empirical results across a range of contemporary LLMs. While the steganographic capabilities of current models remain limited, GPT-4 displays a capability jump suggesting the need for continuous monitoring of steganographic frontier model capabilities. We conclude by laying out a comprehensive research program to mitigate future risks of collusion between generative AI models.
AIDec 12, 2024
Neural Interactive ProofsLewis Hammond, Sam Adam-Day
We consider the problem of how a trusted, but computationally bounded agent (a 'verifier') can learn to interact with one or more powerful but untrusted agents ('provers') in order to solve a given task. More specifically, we study the case in which agents are represented using neural networks and refer to solutions of this problem as neural interactive proofs. First we introduce a unifying framework based on prover-verifier games, which generalises previously proposed interaction protocols. We then describe several new protocols for generating neural interactive proofs, and provide a theoretical comparison of both new and existing approaches. Finally, we support this theory with experiments in two domains: a toy graph isomorphism problem that illustrates the key ideas, and a code validation task using large language models. In so doing, we aim to create a foundation for future work on neural interactive proofs and their application in building safer AI systems.
GTOct 18, 2024
Game Theory with Simulation in the Presence of Unpredictable RandomisationVojtech Kovarik, Nathaniel Sauerberg, Lewis Hammond et al.
AI agents will be predictable in certain ways that traditional agents are not. Where and how can we leverage this predictability in order to improve social welfare? We study this question in a game-theoretic setting where one agent can pay a fixed cost to simulate the other in order to learn its mixed strategy. As a negative result, we prove that, in contrast to prior work on pure-strategy simulation, enabling mixed-strategy simulation may no longer lead to improved outcomes for both players in all so-called "generalised trust games". In fact, mixed-strategy simulation does not help in any game where the simulatee's action can depend on that of the simulator. We also show that, in general, deciding whether simulation introduces Pareto-improving Nash equilibria in a given game is NP-hard. As positive results, we establish that mixed-strategy simulation can improve social welfare if the simulator has the option to scale their level of trust, if the players face challenges with both trust and coordination, or if maintaining some level of privacy is essential for enabling cooperation.
GTFeb 24, 2024
Cooperation and Control in Delegation GamesOliver Sourbut, Lewis Hammond, Harriet Wood
Many settings of interest involving humans and machines -- from virtual personal assistants to autonomous vehicles -- can naturally be modelled as principals (humans) delegating to agents (machines), which then interact with each other on their principals' behalf. We refer to these multi-principal, multi-agent scenarios as delegation games. In such games, there are two important failure modes: problems of control (where an agent fails to act in line their principal's preferences) and problems of cooperation (where the agents fail to work well together). In this paper we formalise and analyse these problems, further breaking them down into issues of alignment (do the players have similar preferences?) and capabilities (how competent are the players at satisfying those preferences?). We show -- theoretically and empirically -- how these measures determine the principals' welfare, how they can be estimated using limited observations, and thus how they might be used to help us design more aligned and cooperative AI systems.
CYOct 19, 2025
Agentic InequalityMatthew Sharp, Omer Bilgin, Iason Gabriel et al.
Autonomous AI agents, capable of complex planning and action, represent a significant technological evolution beyond current generative tools. As these systems become integrated into political and economic life, their distribution and capabilities will be highly consequential. This paper introduces and explores "agentic inequality" - the potential disparities in power, opportunity, and outcomes stemming from differential access to, and capabilities of, AI agents. We analyse the dual potential of this technology, exploring how agents could both exacerbate existing divides and, under the right conditions, serve as a powerful equalising force. To this end, the paper makes three primary contributions. First, it establishes an analytical framework by delineating the three core dimensions through which this inequality can manifest: disparities in the availability, quality, and quantity of agents. Second, it argues that agentic inequality is distinct from prior technological divides. Unlike tools that primarily augment human abilities, agents act as autonomous delegates, creating novel power asymmetries through scalable goal delegation and direct agent-to-agent competition that are poised to reshape outcomes across economic and socio-political spheres. Finally, it provides a systematic analysis of the technical and socioeconomic drivers - from model release strategies to market incentives - that will shape the distribution of agentic power, concluding with a research agenda for navigating the complex governance challenges ahead.
AIJun 17, 2024
IDs for AI SystemsAlan Chan, Noam Kolt, Peter Wills et al.
AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system has certain safety certifications. An investigator may not know whom to investigate when a system causes an incident. It may not be clear whom to contact to shut down a malfunctioning system. Across a number of domains, IDs address analogous problems by identifying particular entities (e.g., a particular Boeing 747) and providing information about other entities of the same class (e.g., some or all Boeing 747s). We propose a framework in which IDs are ascribed to instances of AI systems (e.g., a particular chat session with Claude 3), and associated information is accessible to parties seeking to interact with that system. We characterize IDs for AI systems, provide concrete examples where IDs could be useful, argue that there could be significant demand for IDs from key actors, analyze how those actors could incentivize ID adoption, explore a potential implementation of our framework for deployers of AI systems, and highlight limitations and risks. IDs seem most warranted in settings where AI systems could have a large impact upon the world, such as in making financial transactions or contacting real humans. With further study, IDs could help to manage a world where AI systems pervade society.
MAJul 19, 2021
Rational Verification for Probabilistic SystemsJulian Gutierrez, Lewis Hammond, Anthony W. Lin et al.
Rational verification is the problem of determining which temporal logic properties will hold in a multi-agent system, under the assumption that agents in the system act rationally, by choosing strategies that collectively form a game-theoretic equilibrium. Previous work in this area has largely focussed on deterministic systems. In this paper, we develop the theory and algorithms for rational verification in probabilistic systems. We focus on concurrent stochastic games (CSGs), which can be used to model uncertainty and randomness in complex multi-agent environments. We study the rational verification problem for both non-cooperative games and cooperative games in the qualitative probabilistic setting. In the former case, we consider LTL properties satisfied by the Nash equilibria of the game and in the latter case LTL properties satisfied by the core. In both cases, we show that the problem is 2EXPTIME-complete, thus not harder than the much simpler verification problem of model checking LTL properties of systems modelled as Markov decision processes (MDPs).
AIFeb 1, 2021
Multi-Agent Reinforcement Learning with Temporal Logic SpecificationsLewis Hammond, Alessandro Abate, Julian Gutierrez et al.
In this paper, we study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment, which may exhibit probabilistic behaviour. From a learning perspective these specifications provide a rich formal language with which to capture tasks or objectives, while from a logic and automated verification perspective the introduction of learning capabilities allows for practical applications in large, stochastic, unknown environments. The existing work in this area is, however, limited. Of the frameworks that consider full linear temporal logic or have correctness guarantees, all methods thus far consider only the case of a single temporal logic specification and a single agent. In order to overcome this limitation, we develop the first multi-agent reinforcement learning technique for temporal logic specifications, which is also novel in its ability to handle multiple specifications. We provide correctness and convergence guarantees for our main algorithm - ALMANAC (Automaton/Logic Multi-Agent Natural Actor-Critic) - even when using function approximation. Alongside our theoretical results, we further demonstrate the applicability of our technique via a set of preliminary experiments.
AIOct 8, 2018
Learning Tractable Probabilistic Models for Moral Responsibility and BlameLewis Hammond, Vaishak Belle
Moral responsibility is a major concern in autonomous systems, with applications ranging from self-driving cars to kidney exchanges. Although there have been recent attempts to formalise responsibility and blame, among similar notions, the problem of learning within these formalisms has been unaddressed. From the viewpoint of such systems, the urgent questions are: (a) How can models of moral scenarios and blameworthiness be extracted and learnt automatically from data? (b) How can judgements be computed effectively and efficiently, given the split-second decision points faced by some systems? By building on constrained tractable probabilistic learning, we propose and implement a hybrid (between data-driven and rule-based methods) learning framework for inducing models of such scenarios automatically from data and reasoning tractably from them. We report on experiments that compare our system with human judgement in three illustrative domains: lung cancer staging, teamwork management, and trolley problems.