MLLGJan 7, 2018

Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations

arXiv:1801.02124v245 citations
AI Analysis

This addresses the problem of learning reward functions from imperfect demonstrations in competitive multi-agent settings, which is incremental as it builds on prior work by handling sub-optimality without decoupling agents.

The paper tackles inverse reinforcement learning in zero-sum stochastic games with sub-optimal expert demonstrations by introducing a new objective function that compares experts against Nash Equilibrium strategies, and it develops algorithms for both Nash Equilibrium computation and reward recovery, showing good quality results in numerical experiments on large-scale games.

This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be not optimal. Compared to previous works that decouple agents in the game by assuming optimality in expert strategies, we introduce a new objective function that directly pits experts against Nash Equilibrium strategies, and we design an algorithm to solve for the reward function in the context of inverse reinforcement learning with deep neural networks as model approximations. In our setting the model and algorithm do not decouple by agent. In order to find Nash Equilibrium in large-scale games, we also propose an adversarial training algorithm for zero-sum stochastic games, and show the theoretical appeal of non-existence of local optima in its objective function. In our numerical experiments, we demonstrate that our Nash Equilibrium and inverse reinforcement learning algorithms address games that are not amenable to previous approaches using tabular representations. Moreover, with sub-optimal expert demonstrations our algorithms recover both reward functions and strategies with good quality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes