GT LGMar 31, 2023

Soft-Bellman Equilibrium in Affine Markov Games: Forward Solutions and Inverse Learning

Shenghui Chen, Yue Yu, David Fridovich-Keil, Ufuk Topcu

arXiv:2304.00163v22.32 citationsh-index: 53Has Code

Originality Highly original

AI Analysis

This work addresses the challenge of inferring reward parameters in multi-agent systems, offering a novel solution concept that is incremental over existing Nash equilibrium methods.

The paper tackles the problem of modeling interactions in Markov games with affine reward functions by introducing a soft-Bellman equilibrium for boundedly rational players, and it shows that the proposed inverse learning algorithm reduces the Kullback-Leibler divergence between equilibrium and observed policies by at least two orders of magnitude in experiments.

Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players' actions. We introduce a novel solution concept, the soft-Bellman equilibrium, where each player is boundedly rational and chooses a soft-Bellman policy rather than a purely rational policy as in the well-known Nash equilibrium concept. We provide conditions for the existence and uniqueness of the soft-Bellman equilibrium and propose a nonlinear least-squares algorithm to compute such an equilibrium in the forward problem. We then solve the inverse game problem of inferring the players' reward parameters from observed state-action trajectories via a projected-gradient algorithm. Experiments in a predator-prey OpenAI Gym environment show that the reward parameters inferred by the proposed algorithm outperform those inferred by a baseline algorithm: they reduce the Kullback-Leibler divergence between the equilibrium policies and observed policies by at least two orders of magnitude.

View on arXiv PDF Code

Similar