Multiagent Soft Q-Learning
This addresses coordination issues in multiagent systems for researchers and practitioners, though it is incremental as it adapts an existing method to a specific bottleneck.
The paper tackled the problem of relative overgeneralization in multiagent reinforcement learning for continuous games by proposing Multiagent Soft Q-learning, which outperformed the state-of-the-art MADDPG method by achieving better coordination and converging to superior local optima in cooperative tasks.
Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.