LGAIDec 19, 2016

Self-Correcting Models for Model-Based Reinforcement Learning

arXiv:1612.06018v2103 citations
Originality Incremental advance
AI Analysis

This work addresses the critical issue of error compounding in MBRL for agents with imperfect models, providing theoretical insights and a robust algorithm, though it is incremental as it builds on prior self-correction methods.

The paper tackles the problem of model-based reinforcement learning (MBRL) failing due to compounding errors in flawed dynamics models, and shows that a model's ability to self-correct is more tightly related to MBRL performance than one-step prediction error, with a novel error bound and an algorithm for deterministic MDPs with robust performance guarantees.

When an agent cannot represent a perfectly accurate model of its environment's dynamics, model-based reinforcement learning (MBRL) can fail catastrophically. Planning involves composing the predictions of the model; when flawed predictions are composed, even minor errors can compound and render the model useless for planning. Hallucinated Replay (Talvitie 2014) trains the model to "correct" itself when it produces errors, substantially improving MBRL with flawed models. This paper theoretically analyzes this approach, illuminates settings in which it is likely to be effective or ineffective, and presents a novel error bound, showing that a model's ability to self-correct is more tightly related to MBRL performance than one-step prediction error. These results inspire an MBRL algorithm for deterministic MDPs with performance guarantees that are robust to model class limitations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes