LG MLApr 12, 2020

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

arXiv:2004.05599v316.227 citationsHas Code

Originality Highly original

AI Analysis

This provides a theoretical foundation for kernel-based RL with weak assumptions, benefiting researchers in continuous control and sparse reward tasks.

The paper tackles the exploration-exploitation dilemma in finite-horizon reinforcement learning with metric state-action spaces by introducing Kernel-UCBVI, a model-based optimistic algorithm that uses kernel estimators, achieving a regret bound of $\widetilde{O}\left( H^3 K^{rac{2d}{2d+1}} ight)$ for $K$ episodes and horizon $H$, where $d$ is the covering dimension.

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with $K$ episodes and horizon $H$, we provide a regret bound of $\widetilde{O}\left( H^3 K^{\frac{2d}{2d+1}}\right)$, where $d$ is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and has been previously applied to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.

View on arXiv PDF Code

Similar