LG MA MLApr 29, 2024

Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty

Laixi Shi, Eric Mazumdar, Yuejie Chi, Adam Wierman

arXiv:2404.18909v320.323 citationsh-index: 9ICML

Originality Incremental advance

AI Analysis

This work addresses the sim-to-real gap for multi-agent reinforcement learning, which is incremental as it extends robust RL from single-agent to multi-agent settings with theoretical guarantees.

The paper tackles the problem of learning robust policies in multi-agent environments with environmental uncertainty by introducing distributionally robust Markov games and proposing a sample-efficient model-based algorithm (DRNVI) with finite-sample complexity guarantees, achieving near-optimal sample complexity with respect to factors like state space size and accuracy.

To overcome the sim-to-real gap in reinforcement learning (RL), learned policies must maintain robustness against environmental uncertainties. While robust RL has been widely studied in single-agent regimes, in multi-agent environments, the problem remains understudied -- despite the fact that the problems posed by environmental uncertainties are often exacerbated by strategic interactions. This work focuses on learning in distributionally robust Markov games (RMGs), a robust variant of standard Markov games, wherein each agent aims to learn a policy that maximizes its own worst-case performance when the deployed environment deviates within its own prescribed uncertainty set. This results in a set of robust equilibrium strategies for all agents that align with classic notions of game-theoretic equilibria. Assuming a non-adaptive sampling mechanism from a generative model, we propose a sample-efficient model-based algorithm (DRNVI) with finite-sample complexity guarantees for learning robust variants of various notions of game-theoretic equilibria. We also establish an information-theoretic lower bound for solving RMGs, which confirms the near-optimal sample complexity of DRNVI with respect to problem-dependent factors such as the size of the state space, the target accuracy, and the horizon length.

View on arXiv PDF

Similar