LGAIMAMay 1, 2024

MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure

arXiv:2405.00902v14 citationsh-index: 3AAMAS
Originality Incremental advance
AI Analysis

This addresses the problem of finding Pareto optimal Nash Equilibrium in cooperative multi-agent learning for researchers and practitioners, though it is incremental as it builds on existing off-policy MARL algorithms.

The paper tackles the challenge of inefficient exploration in multi-agent reinforcement learning, particularly in sparse-reward settings, by introducing MESA, a meta-exploration method that learns diverse exploration policies from training tasks to improve performance, achieving significantly better results in multi-agent particle and MuJoCo environments.

Multi-agent reinforcement learning (MARL) algorithms often struggle to find strategies close to Pareto optimal Nash Equilibrium, owing largely to the lack of efficient exploration. The problem is exacerbated in sparse-reward settings, caused by the larger variance exhibited in policy learning. This paper introduces MESA, a novel meta-exploration method for cooperative multi-agent learning. It learns to explore by first identifying the agents' high-rewarding joint state-action subspace from training tasks and then learning a set of diverse exploration policies to "cover" the subspace. These trained exploration policies can be integrated with any off-policy MARL algorithm for test-time tasks. We first showcase MESA's advantage in a multi-step matrix game. Furthermore, experiments show that with learned exploration policies, MESA achieves significantly better performance in sparse-reward tasks in several multi-agent particle environments and multi-agent MuJoCo environments, and exhibits the ability to generalize to more challenging tasks at test time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes