LGJan 29, 2018

Learning the Reward Function for a Misspecified Model

arXiv:1801.09624v315 citations
Originality Incremental advance
AI Analysis

This addresses a specific issue in reinforcement learning for agents with imperfect models, but it is incremental as it builds on existing algorithms.

The paper tackles the problem of learning reward functions in model-based reinforcement learning when the dynamics model is flawed, showing that their approach yields dramatic improvements in control performance.

In model-based reinforcement learning it is typical to decouple the problems of learning the dynamics model and learning the reward function. However, when the dynamics model is flawed, it may generate erroneous states that would never occur in the true environment. It is not clear a priori what value the reward function should assign to such states. This paper presents a novel error bound that accounts for the reward model's behavior in states sampled from the model. This bound is used to extend the existing Hallucinated DAgger-MC algorithm, which offers theoretical performance guarantees in deterministic MDPs that do not assume a perfect model can be learned. Empirically, this approach to reward learning can yield dramatic improvements in control performance when the dynamics model is flawed.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes