MAMay 18

The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection

arXiv:2605.181852.5
Predicted impact top 99% in MA · last 90 daysOriginality Incremental advance
AI Analysis

For researchers studying multi-agent reinforcement learning and cooperation, this paper offers a theoretical foundation for how partner selection mechanisms promote cooperation, moving beyond agent-based simulations.

This paper provides an analytical solution to policy-gradient dynamics in social dilemmas with partner selection, proving that partner selection promotes cooperation under simple rules and that population variance is necessary for cooperation to emerge. The stochastic model accurately captures dynamics and clarifies the effect of learning rate on cooperation.

In social dilemmas self-interested learning agents face the choice between the societal benefit of cooperation and the immediate reward of defection. Significant evidence exists on the benefits of assortment mechanisms such as partner selection for the emergence of cooperation, but this is largely available through agent-based simulations. In this paper, we provide an analytical solution to the problem, studying the policy-gradient dynamics in a multi-agent environment with partner selection. We show how partner selection changes the opponent distribution and hence the reward landscape, and prove this promotes cooperation under simple rules known from the literature. In particular, we find that population variance is a necessary condition for cooperation to emerge. Using a two-dimensional Wiener process, we extend the dynamics to capture the stochastic effects of partner selection and the resulting opponent distribution. We derive a sufficient condition for the population to be cooperation-promoting and prove the existence of a stationary distribution. Simulations confirm that the stochastic model accurately captures the policy-gradient dynamics and clarifies how the learning rate affects the emergence of cooperation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes