AILGNov 12, 2019

Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

arXiv:1911.05010v22 citations
Originality Incremental advance
AI Analysis

This addresses the problem of inefficiency in partially-observable environments for reinforcement learning practitioners, offering a more integrated approach that is incremental over existing spectral learning methods.

The paper tackles the challenge of learning and planning in partially-observable reinforcement learning by proposing a novel algorithm that integrates these two processes, showing improved sample and time efficiency compared to traditional two-stage methods in empirical tests on two domains.

Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning. Traditional methods consider these two problems as independent, resulting in a classical two-stage paradigm: first learn the environment dynamics and then plan accordingly. This approach, however, disconnects the two problems and can consequently lead to algorithms that are sample inefficient and time consuming. In this paper, we propose a novel algorithm that combines learning and planning together. Our algorithm is closely related to the spectral learning algorithm for predicitive state representations and offers appealing theoretical guarantees and time complexity. We empirically show on two domains that our approach is more sample and time efficient compared to classical methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes