LGSYMLMar 14, 2019

On Applications of Bootstrap in Continuous Space Reinforcement Learning

arXiv:1903.05803v215 citations
Originality Incremental advance
AI Analysis

This addresses the trade-off between identification and control in reinforcement learning for continuous spaces, offering a novel policy approach with theoretical guarantees.

The paper tackles the problem of decision-making in continuous state and action spaces using linear dynamical models, showing that bootstrap-based policies achieve a square root scaling of regret with respect to time and provide results on learning model dynamics accuracy.

In decision making problems for continuous state and action spaces, linear dynamical models are widely employed. Specifically, policies for stochastic linear systems subject to quadratic cost functions capture a large number of applications in reinforcement learning. Selected randomized policies have been studied in the literature recently that address the trade-off between identification and control. However, little is known about policies based on bootstrapping observed states and actions. In this work, we show that bootstrap-based policies achieve a square root scaling of regret with respect to time. We also obtain results on the accuracy of learning the model's dynamics. Corroborative numerical analysis that illustrates the technical results is also provided.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes