Self-Organized Polynomial-Time Coordination Graphs
This addresses a critical computational bottleneck in multi-agent systems, enabling more efficient coordination for tasks like robotics or gaming, though it is an incremental improvement over existing coordination graph methods.
The paper tackles the NP-hard complexity of greedy action selection in coordination graphs for multi-agent reinforcement learning by proposing SOP-CG, a method that uses structured graph classes to ensure accurate and computationally efficient action selection, improving performance across various cooperative tasks.
Coordination graph is a promising approach to model agent collaboration in multi-agent reinforcement learning. It conducts a graph-based value factorization and induces explicit coordination among agents to complete complicated tasks. However, one critical challenge in this paradigm is the complexity of greedy action selection with respect to the factorized values. It refers to the decentralized constraint optimization problem (DCOP), which and whose constant-ratio approximation are NP-hard problems. To bypass this systematic hardness, this paper proposes a novel method, named Self-Organized Polynomial-time Coordination Graphs (SOP-CG), which uses structured graph classes to guarantee the accuracy and the computational efficiency of collaborated action selection. SOP-CG employs dynamic graph topology to ensure sufficient value function expressiveness. The graph selection is unified into an end-to-end learning paradigm. In experiments, we show that our approach learns succinct and well-adapted graph topologies, induces effective coordination, and improves performance across a variety of cooperative multi-agent tasks.