LGAIMLMay 5, 2019

Learning to Control in Metric Space with Optimal Regret

arXiv:1905.01576v130 citations
Originality Highly original
AI Analysis

This work addresses the problem of optimal regret in metric spaces for researchers in reinforcement learning, offering a foundational approach with broad applicability.

The paper tackles online reinforcement learning for deterministic control systems with arbitrary state-action spaces by proposing a simple algorithm using a function approximation oracle, achieving a regret bound of O(HL(KH)^{(d-1)/d}) and establishing a near-matching lower bound.

We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that the regret of the algorithm after $K$ episodes is $O(HL(KH)^{\frac{d-1}{d}}) $ where $L$ is a smoothness parameter, and $d$ is the doubling dimension of the state-action space with respect to the given metric. We also establish a near-matching regret lower bound. The proposed method can be adapted to work for more structured transition systems, including the finite-state case and the case where value functions are linear combinations of features, where the method also achieve the optimal regret.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes