AIAug 27, 2024

On Stateful Value Factorization in Multi-Agent Reinforcement Learning

Enrico Marchesini, Andrea Baisero, Rupali Bhati, Christopher Amato

arXiv:2408.15381v29.68 citationsh-index: 9

Originality Incremental advance

AI Analysis

This work addresses a foundational issue in multi-agent reinforcement learning by reconnecting theory and practice, though it is incremental as it builds on existing factorization methods.

The paper tackles the mismatch between theory and practice in value factorization for multi-agent reinforcement learning by formally analyzing the use of state instead of history and introducing DuelMIX, a new algorithm that learns distinct per-agent utility estimators to improve performance and achieve full expressiveness.

Value factorization is a popular paradigm for designing scalable multi-agent reinforcement learning algorithms. However, current factorization methods make choices without full justification that may limit their performance. For example, the theory in prior work uses stateless (i.e., history) functions, while the practical implementations use state information -- making the motivating theory a mismatch for the implementation. Also, methods have built off of previous approaches, inheriting their architectures without exploring other, potentially better ones. To address these concerns, we formally analyze the theory of using the state instead of the history in current methods -- reconnecting theory and practice. We then introduce DuelMIX, a factorization algorithm that learns distinct per-agent utility estimators to improve performance and achieve full expressiveness. Experiments on StarCraft II micromanagement and Box Pushing tasks demonstrate the benefits of our intuitions.

View on arXiv PDF

Similar