MLLGJul 14, 2013

Probabilistic inverse reinforcement learning in unknown environments

arXiv:1307.3785v113 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of inverse reinforcement learning for agents in uncertain settings, though it is incremental as it extends existing probabilistic approaches.

The paper tackles the problem of learning agent preferences from demonstrations in unknown stochastic Markov environments to construct improved policies, achieving competitive performance against methods that assume known dynamics.

We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes