LGAIMLAug 29, 2018

Approximate Exploration through State Abstraction

arXiv:1808.09819v214 citations
AI Analysis

This work addresses the challenge of practical exploration in reinforcement learning for researchers, but it is incremental as it builds on existing pseudo-count methods.

The paper tackles the problem of making theoretical exploration methods practical in reinforcement learning by studying approximate exploration, specifically analyzing pseudo-count based bonuses with state abstraction. It shows that approximation trades off learning speed and policy quality, identifies mismatches in pseudo-count derivations, and derives a new bonus to address this.

Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impractical. In this paper we study the interplay between exploration and approximation, what we call approximate exploration. Our main goal is to further our theoretical understanding of pseudo-count based exploration bonuses (Bellemare et al., 2016), a practical exploration scheme based on density modelling. As a warm-up, we quantify the performance of an exploration algorithm, MBIE-EB (Strehl and Littman, 2008), when explicitly combined with state aggregation. This allows us to confirm that, as might be expected, approximation allows the agent to trade off between learning speed and quality of the learned policy. Next, we show how a given density model can be related to an abstraction and that the corresponding pseudo-count bonus can act as a substitute in MBIE-EB combined with this abstraction, but may lead to either under- or over-exploration. Then, we show that a given density model also defines an implicit abstraction, and find a surprising mismatch between pseudo-counts derived either implicitly or explicitly. Finally we derive a new pseudo-count bonus alleviating this issue.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes