Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning Heuristics
This addresses a bottleneck in multi-agent reinforcement learning for complex cooperative tasks, though it appears incremental as it builds on prior methods.
The paper tackles the problem of solving complex cooperative domains in Dec-POMDPs, where existing SOTA algorithms like WQMIX, QMIX, QTRAN, and VDN fail, by introducing SA2MA, a two-stage algorithm that first solves a single-agent problem and then uses that policy for multi-agent cooperation, achieving a clear advantage over competitors.
WQMIX, QMIX, QTRAN, and VDN are SOTA algorithms for Dec-POMDP. All of them cannot solve complex agents' cooperation domains. We give an algorithm to solve such problems. In the first stage, we solve a single-agent problem and get a policy. In the second stage, we solve the multi-agent problem with the single-agent policy. SA2MA has a clear advantage over all competitors in complex agents' cooperative domains.