LGAIMLMay 25, 2016

A PAC RL Algorithm for Episodic POMDPs

arXiv:1605.08062v260 citations
AI Analysis

This work provides a significant improvement for researchers and practitioners in AI and robotics dealing with episodic POMDPs, though it is incremental as it builds on existing method of moments techniques.

The authors tackled the problem of reinforcement learning in partially observable environments by developing an algorithm with polynomial sample complexity, addressing the previously exponential bounds in episode length.

Many interesting real world domains involve reinforcement learning (RL) in partially observable environments. Efficient learning in such domains is important, but existing sample complexity bounds for partially observable RL are at least exponential in the episode length. We give, to our knowledge, the first partially observable RL algorithm with a polynomial bound on the number of episodes on which the algorithm may not achieve near-optimal performance. Our algorithm is suitable for an important class of episodic POMDPs. Our approach builds on recent advances in method of moments for latent variable model estimation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes