ROAILGOct 27, 2017

Inverse Reinforcement Learning Under Noisy Observations

arXiv:1710.10116v16 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of learning from imperfect observations in applications like security penetration, but it is incremental as it extends existing methods to continuous-time and action-inclusive models.

The paper tackles the problem of inverse reinforcement learning when the expert's trajectory is observed only through noisy continuous-time data, such as sound, and presents an algorithm based on expectation maximization and maximum entropy to handle this non-linear, non-convex scenario, enabling learning even with extreme noise.

We consider the problem of performing inverse reinforcement learning when the trajectory of the expert is not perfectly observed by the learner. Instead, a noisy continuous-time observation of the trajectory is provided to the learner. This problem exhibits wide-ranging applications and the specific application we consider here is the scenario in which the learner seeks to penetrate a perimeter patrolled by a robot. The learner's field of view is limited due to which it cannot observe the patroller's complete trajectory. Instead, we allow the learner to listen to the expert's movement sound, which it can also use to estimate the expert's state and action using an observation model. We treat the expert's state and action as hidden data and present an algorithm based on expectation maximization and maximum entropy principle to solve the non-linear, non-convex problem. Related work considers discrete-time observations and an observation model that does not include actions. In contrast, our technique takes expectations over both state and action of the expert, enabling learning even in the presence of extreme noise and broader applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes