LGMAOCNov 20, 2023

Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes

arXiv:2311.11789v23 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses scalability issues for researchers and practitioners in multi-agent reinforcement learning, though it is incremental as it builds on decentralized approaches with approximations.

The paper tackles the computational challenge of policy iteration in cooperative multi-agent MDPs, where the number of actions grows exponentially with agents, by proposing approximate decentralized policy iteration algorithms using linear programming with function approximation, and demonstrates advantages over existing state-of-the-art methods.

In this work, we consider a cooperative multi-agent Markov decision process (MDP) involving m agents. At each decision epoch, all the m agents independently select actions in order to maximize a common long-term objective. In the policy iteration process of multi-agent setup, the number of actions grows exponentially with the number of agents, incurring huge computational costs. Thus, recent works consider decentralized policy improvement, where each agent improves its decisions unilaterally, assuming that the decisions of the other agents are fixed. However, exact value functions are considered in the literature, which is computationally expensive for a large number of agents with high dimensional state-action space. Thus, we propose approximate decentralized policy iteration algorithms, using approximate linear programming with function approximation to compute the approximate value function for decentralized policy improvement. Further, we consider (both) cooperative multi-agent finite and infinite horizon discounted MDPs and propose suitable algorithms in each case. Moreover, we provide theoretical guarantees for our algorithms and also demonstrate their advantages over existing state-of-the-art algorithms in the literature.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes