Communication-Efficient Actor-Critic Methods for Homogeneous Markov Games
This work addresses communication efficiency and theoretical foundations for cooperative multi-agent reinforcement learning, offering a novel approach for homogeneous agents.
The paper tackles the communication cost and lack of theoretical justification in cooperative multi-agent reinforcement learning by characterizing a subclass of homogeneous Markov games where policy sharing is provably optimal, and develops a consensus-based decentralized actor-critic method with convergence guarantees and practical algorithms that reduce communication while maintaining performance comparable to centralized training.
Recent success in cooperative multi-agent reinforcement learning (MARL) relies on centralized training and policy sharing. Centralized training eliminates the issue of non-stationarity MARL yet induces large communication costs, and policy sharing is empirically crucial to efficient learning in certain tasks yet lacks theoretical justification. In this paper, we formally characterize a subclass of cooperative Markov games where agents exhibit a certain form of homogeneity such that policy sharing provably incurs no suboptimality. This enables us to develop the first consensus-based decentralized actor-critic method where the consensus update is applied to both the actors and the critics while ensuring convergence. We also develop practical algorithms based on our decentralized actor-critic method to reduce the communication cost during training, while still yielding policies comparable with centralized training.