MAMay 18
The Dynamics of Policy Gradient in Social Dilemmas with Partner SelectionBenedict Russell, Chin-wing Leung, Paolo Turrini
In social dilemmas self-interested learning agents face the choice between the societal benefit of cooperation and the immediate reward of defection. Significant evidence exists on the benefits of assortment mechanisms such as partner selection for the emergence of cooperation, but this is largely available through agent-based simulations. In this paper, we provide an analytical solution to the problem, studying the policy-gradient dynamics in a multi-agent environment with partner selection. We show how partner selection changes the opponent distribution and hence the reward landscape, and prove this promotes cooperation under simple rules known from the literature. In particular, we find that population variance is a necessary condition for cooperation to emerge. Using a two-dimensional Wiener process, we extend the dynamics to capture the stochastic effects of partner selection and the resulting opponent distribution. We derive a sufficient condition for the population to be cooperation-promoting and prove the existence of a stationary distribution. Simulations confirm that the stochastic model accurately captures the policy-gradient dynamics and clarifies how the learning rate affects the emergence of cooperation.
AIMar 25
Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer BehaviourAdeela Bashir, Zhao Song, Ndidi Bianca Ogbo et al.
AI safety is an increasingly urgent concern as the capabilities and adoption of AI systems grow. Existing evolutionary models of AI governance have primarily examined incentives for safe development and effective regulation, typically representing users' trust as a one-shot adoption choice rather than as a dynamic, evolving process shaped by repeated interactions. We instead model trust as reduced monitoring in a repeated, asymmetric interaction between users and AI developers, where checking AI behaviour is costly. Using evolutionary game theory, we study how user trust strategies and developer choices between safe (compliant) and unsafe (non-compliant) AI co-evolve under different levels of monitoring cost and institutional regimes. We complement the infinite-population replicator analysis with stochastic finite-population dynamics and reinforcement learning (Q-learning) simulations. Across these approaches, we find three robust long-run regimes: no adoption with unsafe development, unsafe but widely adopted systems, and safe systems that are widely adopted. Only the last is desirable, and it arises when penalties for unsafe behaviour exceed the extra cost of safety and users can still afford to monitor at least occasionally. Our results formally support governance proposals that emphasise transparency, low-cost monitoring, and meaningful sanctions, and they show that neither regulation alone nor blind user trust is sufficient to prevent evolutionary drift towards unsafe or low-adoption outcomes.
MAJun 29, 2020
The Evolutionary Dynamics of Independent Learning Agents in Population GamesShuyue Hu, Chin-Wing Leung, Ho-fung Leung et al.
Understanding the evolutionary dynamics of reinforcement learning under multi-agent settings has long remained an open problem. While previous works primarily focus on 2-player games, we consider population games, which model the strategic interactions of a large population comprising small and anonymous agents. This paper presents a formal relation between stochastic processes and the dynamics of independent learning agents who reason based on the reward signals. Using a master equation approach, we provide a novel unified framework for characterising population dynamics via a single partial differential equation (Theorem 1). Through a case study involving Cross learning agents, we illustrate that Theorem 1 allows us to identify qualitatively different evolutionary dynamics, to analyse steady states, and to gain insights into the expected behaviour of a population. In addition, we present extensive experimental results validating that Theorem 1 holds for a variety of learning methods and population games.