LGMLApr 12, 2020

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

arXiv:2004.05599v327 citations
AI Analysis

This provides a theoretical foundation for kernel-based RL with weak assumptions, benefiting researchers in continuous control and sparse reward tasks.

The paper tackles the exploration-exploitation dilemma in finite-horizon reinforcement learning with metric state-action spaces by introducing Kernel-UCBVI, a model-based optimistic algorithm that uses kernel estimators, achieving a regret bound of $\widetilde{O}\left( H^3 K^{ rac{2d}{2d+1}} ight)$ for $K$ episodes and horizon $H$, where $d$ is the covering dimension.

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with $K$ episodes and horizon $H$, we provide a regret bound of $\widetilde{O}\left( H^3 K^{\frac{2d}{2d+1}}\right)$, where $d$ is the covering dimension of the joint state-action space. This is the first regret bound for kernel-based RL using smoothing kernels, which requires very weak assumptions on the MDP and has been previously applied to a wide range of tasks. We empirically validate our approach in continuous MDPs with sparse rewards.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes