OC ROOct 3, 2021

Maximum-Entropy Multi-Agent Dynamic Games: Forward and Inverse Solutions

arXiv:2110.01027v118.686 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of modeling boundedly rational agents in multi-agent systems, which is incremental as it extends maximum entropy principles from single to multiple agents.

The paper tackled the problem of modeling multiple stochastic agents in dynamic games by introducing a new stochastic Nash equilibrium concept called Entropic Cost Equilibrium (ECE), and developed algorithms for both forward computation of ECE policies and inverse inference of cost functions, demonstrating improved accuracy in multi-agent collision avoidance and traffic datasets compared to standard methods.

In this paper, we study the problem of multiple stochastic agents interacting in a dynamic game scenario with continuous state and action spaces. We define a new notion of stochastic Nash equilibrium for boundedly rational agents, which we call the Entropic Cost Equilibrium (ECE). We show that ECE is a natural extension to multiple agents of Maximum Entropy optimality for single agents. We solve both the "forward" and "inverse" problems for the multi-agent ECE game. For the forward problem, we provide a Riccati algorithm to compute closed-form ECE feedback policies for the agents, which are exact in the Linear-Quadratic-Gaussian case. We give an iterative variant to find locally ECE feedback policies for the nonlinear case. For the inverse problem, we present an algorithm to infer the cost functions of the multiple interacting agents given noisy, boundedly rational input and state trajectory examples from agents acting in an ECE. The effectiveness of our algorithms is demonstrated in a simulated multi-agent collision avoidance scenario, and with data from the INTERACTION traffic dataset. In both cases, we show that, by taking into account the agents' game theoretic interactions using our algorithm, a more accurate model of agents' costs can be learned, compared with standard inverse optimal control methods.

View on arXiv PDF

Similar