LG AI MLMar 22, 2017

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Christoph Dann, Tor Lattimore, Emma Brunskill

arXiv:1703.07710v328.4326 citationsHas Code

Originality Highly original

AI Analysis

This work addresses a gap in statistical performance bounds for RL, which is critical for high-stakes applications like healthcare, by providing a novel theoretical framework.

The paper tackles the problem of theoretically measuring reinforcement learning algorithm performance by introducing the Uniform-PAC framework, which unifies PAC and regret bounds, and demonstrates it with an algorithm achieving near-optimal guarantees for finite-state episodic MDPs.

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.

View on arXiv PDF Code

Similar