LGMAJul 17, 2023

Meta-Value Learning: a General Framework for Learning with Learning Awareness

arXiv:2307.08863v37 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses the problem of learning interactions in multi-agent systems for researchers, but it appears incremental as it builds on prior methods like LOLA.

The paper tackles the challenge of gradient-based learning in multi-agent systems by proposing a framework that evaluates joint policies based on their long-term meta-value, avoiding explicit representation of continuous action spaces. The resulting MeVa method is shown to be consistent and far-sighted, with comparisons to prior work on repeated matrix games.

Gradient-based learning in multi-agent systems is difficult because the gradient derives from a first-order model which does not account for the interaction between agents' learning processes. LOLA (arXiv:1709.04326) accounts for this by differentiating through one step of optimization. We propose to judge joint policies by their long-term prospects as measured by the meta-value, a discounted sum over the returns of future optimization iterates. We apply a form of Q-learning to the meta-game of optimization, in a way that avoids the need to explicitly represent the continuous action space of policy updates. The resulting method, MeVa, is consistent and far-sighted, and does not require REINFORCE estimators. We analyze the behavior of our method on a toy game and compare to prior work on repeated matrix games.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes