OCAIGTMASYMay 16, 2021

Robust optimal policies for team Markov games

arXiv:2105.07405v21 citations
Originality Incremental advance
AI Analysis

This work addresses robustness in cooperative multi-agent decision-making under uncertainty, representing an incremental improvement over existing methods.

The paper tackles the sensitivity of optimal policies to uncertain parameters in team Markov games by proposing a robust model and developing a robust iterative learning algorithm with proven convergence, achieving faster convergence rates and alleviating the curse of dimensionality compared to robust dynamic programming.

In stochastic dynamic environments, team Markov games have emerged as a versatile paradigm for studying sequential decision-making problems of fully cooperative multi-agent systems. However, the optimality of the derived policies is usually sensitive to model parameters, which are typically unknown and required to be estimated from noisy data in practice. To mitigate the sensitivity of optimal policies to these uncertain parameters, we propose a robust model of team Markov games in this paper, where agents utilize robust optimization approaches to update strategies. This model extends team Markov games to the scenario of incomplete information and meanwhile provides an alternative solution concept of robust team optimality. To seek such a solution, we develop a robust iterative learning algorithm of team policies and prove its convergence. This algorithm, compared with robust dynamic programming, not only possesses a faster convergence rate, but also allows for using approximation calculations to alleviate the curse of dimensionality. Moreover, some numerical simulations are presented to demonstrate the effectiveness of the algorithm by generalizing the game model of sequential social dilemmas to uncertain scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes