LGGTMAPEGNOct 22, 2024

Evolution of Societies via Reinforcement Learning

arXiv:2410.17466v4h-index: 3
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of scaling MARL simulations for researchers studying evolutionary dynamics, though it is incremental as it adapts existing methods to a new context.

The authors tackled the challenge of simulating large, heterogeneous populations of co-learning agents in multi-agent reinforcement learning (MARL) by developing a fast, parallelizable implementation of Policy Gradient and Opponent-Learning Awareness for evolutionary simulations in stateless normal-form games. They demonstrated this by evolving 200,000 agents in games like Hawk-Dove, Stag-Hunt, and Rock-Paper-Scissors, showing how advanced MARL rules impact social evolution.

The universe involves many independent co-learning agents as an ever-evolving part of our observed environment. Yet, in practice, Multi-Agent Reinforcement Learning (MARL) applications are typically constrained to small, homogeneous populations and remain computationally intensive. We propose a methodology that enables simulating populations of Reinforcement Learning agents at evolutionary scale. More specifically, we derive a fast, parallelizable implementation of Policy Gradient (PG) and Opponent-Learning Awareness (LOLA), tailored for evolutionary simulations where agents undergo random pairwise interactions in stateless normal-form games. We demonstrate our approach by simulating the evolution of very large populations made of heterogeneous co-learning agents, under both naive and advanced learning strategies. In our experiments, 200,000 PG or LOLA agents evolve in the classic games of Hawk-Dove, Stag-Hunt, and Rock-Paper-Scissors. Each game provides distinct insights into how populations evolve under both naive and advanced MARL rules, including compelling ways in which Opponent-Learning Awareness affects social evolution.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes