LG MLMar 1, 2018

On Oracle-Efficient PAC RL with Rich Observations

Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

arXiv:1803.00606v426.2107 citations

Originality Incremental advance

AI Analysis

This work addresses computational efficiency in reinforcement learning for researchers and practitioners dealing with complex environments, though it is incremental as it builds on prior oracle-based methods and highlights limitations.

The paper tackles the computational tractability of PAC reinforcement learning with rich observations by presenting new provably sample-efficient algorithms for deterministic hidden state dynamics and stochastic rich observations, operating in an oracle model to avoid enumeration. It also shows that the known sample-efficient algorithm OLIVE cannot be implemented in this model for stochastic hidden state dynamics and provides examples highlighting fundamental challenges in tractable PAC RL.

We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation -- accessing policy and value function classes exclusively through standard optimization primitives -- and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE, cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.

View on arXiv PDF

Similar