Evolution of Societies via Reinforcement Learning
This work addresses the problem of scaling MARL simulations for researchers studying evolutionary dynamics, though it is incremental as it adapts existing methods to a new context.
The authors tackled the challenge of simulating large, heterogeneous populations of co-learning agents in multi-agent reinforcement learning (MARL) by developing a fast, parallelizable implementation of Policy Gradient and Opponent-Learning Awareness for evolutionary simulations in stateless normal-form games. They demonstrated this by evolving 200,000 agents in games like Hawk-Dove, Stag-Hunt, and Rock-Paper-Scissors, showing how advanced MARL rules impact social evolution.
The universe involves many independent co-learning agents as an ever-evolving part of our observed environment. Yet, in practice, Multi-Agent Reinforcement Learning (MARL) applications are typically constrained to small, homogeneous populations and remain computationally intensive. We propose a methodology that enables simulating populations of Reinforcement Learning agents at evolutionary scale. More specifically, we derive a fast, parallelizable implementation of Policy Gradient (PG) and Opponent-Learning Awareness (LOLA), tailored for evolutionary simulations where agents undergo random pairwise interactions in stateless normal-form games. We demonstrate our approach by simulating the evolution of very large populations made of heterogeneous co-learning agents, under both naive and advanced learning strategies. In our experiments, 200,000 PG or LOLA agents evolve in the classic games of Hawk-Dove, Stag-Hunt, and Rock-Paper-Scissors. Each game provides distinct insights into how populations evolve under both naive and advanced MARL rules, including compelling ways in which Opponent-Learning Awareness affects social evolution.