A PAC RL Algorithm for Episodic POMDPs
This work provides a significant improvement for researchers and practitioners in AI and robotics dealing with episodic POMDPs, though it is incremental as it builds on existing method of moments techniques.
The authors tackled the problem of reinforcement learning in partially observable environments by developing an algorithm with polynomial sample complexity, addressing the previously exponential bounds in episode length.
Many interesting real world domains involve reinforcement learning (RL) in partially observable environments. Efficient learning in such domains is important, but existing sample complexity bounds for partially observable RL are at least exponential in the episode length. We give, to our knowledge, the first partially observable RL algorithm with a polynomial bound on the number of episodes on which the algorithm may not achieve near-optimal performance. Our algorithm is suitable for an important class of episodic POMDPs. Our approach builds on recent advances in method of moments for latent variable model estimation.