LG AIJun 26, 2019

A Tractable Algorithm For Finite-Horizon Continuous Reinforcement Learning

Phanideep Gampa, Sairam Satwik Kondamudi, Lakshmanan Kailasam

arXiv:1906.11245v11 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient learning in continuous state spaces for RL practitioners, representing an incremental improvement over prior theoretical bounds.

The authors tackled the finite-horizon continuous reinforcement learning problem by developing a tractable algorithm based on optimistic value iteration, achieving a lower bound on regret of Ω(T^{2/3}) that improves the previous Ω(T^{1/2}) bound, and showing an upper bound on discretization error of const.Ln^{-α}T under Hölder continuity assumptions.

We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-fold. First,we give a tractable algorithm based on optimistic value iteration for the problem. Next,we give a lower bound on regret of order $Ω(T^{2/3})$ for any algorithm discretizes the state space, improving the previous regret bound of $Ω(T^{1/2})$ of Ortner and Ryabko \cite{contrl} for the same problem. Next,under the assumption that the rewards and transitions are Hölder Continuous we show that the upper bound on the discretization error is $const.Ln^{-α}T$. Finally,we give some simple experiments to validate our propositions.

View on arXiv PDF

Similar