LG MAJul 17, 2023

Meta-Value Learning: a General Framework for Learning with Learning Awareness

Tim Cooijmans, Milad Aghajohari, Aaron Courville

arXiv:2307.08863v310.77 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of learning interactions in multi-agent systems for researchers, but it appears incremental as it builds on prior methods like LOLA.

The paper tackles the challenge of gradient-based learning in multi-agent systems by proposing a framework that evaluates joint policies based on their long-term meta-value, avoiding explicit representation of continuous action spaces. The resulting MeVa method is shown to be consistent and far-sighted, with comparisons to prior work on repeated matrix games.

Gradient-based learning in multi-agent systems is difficult because the gradient derives from a first-order model which does not account for the interaction between agents' learning processes. LOLA (arXiv:1709.04326) accounts for this by differentiating through one step of optimization. We propose to judge joint policies by their long-term prospects as measured by the meta-value, a discounted sum over the returns of future optimization iterates. We apply a form of Q-learning to the meta-game of optimization, in a way that avoids the need to explicitly represent the continuous action space of policy updates. The resulting method, MeVa, is consistent and far-sighted, and does not require REINFORCE estimators. We analyze the behavior of our method on a toy game and compare to prior work on repeated matrix games.

View on arXiv PDF Code

Similar