LG AI MA MLOct 3, 2019

Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics

Johannes Ackermann, Volker Gabler, Takayuki Osa, Masashi Sugiyama

arXiv:1910.01465v216.5178 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses inefficiencies in policy learning for multi-agent domains, particularly in mixed cooperative-competitive tasks, with potential applications to high-dimensional robotic tasks.

The paper tackles the problem of value function overestimation bias in multi-agent reinforcement learning by proposing an approach using double centralized critics, showing a significant advantage over current methods on six mixed cooperative-competitive tasks.

Many real world tasks require multiple agents to work together. Multi-agent reinforcement learning (RL) methods have been proposed in recent years to solve these tasks, but current methods often fail to efficiently learn policies. We thus investigate the presence of a common weakness in single-agent RL, namely value function overestimation bias, in the multi-agent setting. Based on our findings, we propose an approach that reduces this bias by using double centralized critics. We evaluate it on six mixed cooperative-competitive tasks, showing a significant advantage over current methods. Finally, we investigate the application of multi-agent methods to high-dimensional robotic tasks and show that our approach can be used to learn decentralized policies in this domain.

View on arXiv PDF Code

Similar