LG MAAug 30, 2021

Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning

arXiv:2108.12988v38.48 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of creating adaptable agents for multi-agent systems, though it is incremental as it builds on existing multi-agent reinforcement learning methods.

The paper tackles the problem of agents in multi-agent reinforcement learning generalizing across games with varying numbers of participants, proposing Meta Representations for Agents (MRA) to model game-common and game-specific strategic knowledge, which improves training performance and generalization ability in evaluation games.

In multi-agent reinforcement learning, the behaviors that agents learn in a single Markov Game (MG) are typically confined to the given agent number. Every single MG induced by varying the population may possess distinct optimal joint strategies and game-specific knowledge, which are modeled independently in modern multi-agent reinforcement learning algorithms. In this work, our focus is on creating agents that can generalize across population-varying MGs. Instead of learning a unimodal policy, each agent learns a policy set comprising effective strategies across a variety of games. To achieve this, we propose Meta Representations for Agents (MRA) that explicitly models the game-common and game-specific strategic knowledge. By representing the policy sets with multi-modal latent policies, the game-common strategic knowledge and diverse strategic modes are discovered through an iterative optimization procedure. We prove that by approximately maximizing the resulting constrained mutual information objective, the policies can reach Nash Equilibrium in every evaluation MG when the latent space is sufficiently large. When deploying MRA in practical settings with limited latent space sizes, fast adaptation can be achieved by leveraging the first-order gradient information. Extensive experiments demonstrate the effectiveness of MRA in improving training performance and generalization ability in challenging evaluation games.

View on arXiv PDF

Similar