LGMLJan 25, 2019

Provably efficient RL with Rich Observations via Latent State Decoding

arXiv:1901.09018v3249 citations
Originality Highly original
AI Analysis

This addresses the challenge of efficient reinforcement learning in environments with complex observations for researchers and practitioners, offering a provable method with significant performance gains.

The paper tackles the exploration problem in episodic MDPs with rich observations from latent states by estimating a mapping from observations to latent states through regression and clustering, and provides finite-sample guarantees and empirical evaluation. It shows exponential improvement over Q-learning with naive exploration, even when Q-learning has access to latent states.

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps -- where previously decoded latent states provide labels for later regression problems -- and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with naïve exploration, even when $Q$-learning has cheating access to latent states.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes