MLLGSYDSPRMay 4, 2016

A Bayesian Approach to Policy Recognition and State Representation Learning

arXiv:1605.01278v49 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of building more flexible and practical behavioral models for system control in robotics or AI, though it appears incremental as it extends Bayesian methods to LfD without claiming broad SOTA breakthroughs.

The paper tackles the problem of learning from demonstration (LfD) by addressing limitations of existing methods, such as assumptions of deterministic optimal policies, and proposes a Bayesian approach that models the posterior distribution of expert controllers and infers state representation complexity, resulting in a more general framework for arbitrary stochastic expert policies.

Learning from demonstration (LfD) is the process of building behavioral models of a task from demonstrations provided by an expert. These models can be used e.g. for system control by generalizing the expert demonstrations to previously unencountered situations. Most LfD methods, however, make strong assumptions about the expert behavior, e.g. they assume the existence of a deterministic optimal ground truth policy or require direct monitoring of the expert's controls, which limits their practical use as part of a general system identification framework. In this work, we consider the LfD problem in a more general setting where we allow for arbitrary stochastic expert policies, without reasoning about the optimality of the demonstrations. Following a Bayesian methodology, we model the full posterior distribution of possible expert controllers that explain the provided demonstration data. Moreover, we show that our methodology can be applied in a nonparametric context to infer the complexity of the state representation used by the expert, and to learn task-appropriate partitionings of the system state space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes