MLLGDec 9, 2019

Optimism in Reinforcement Learning with Generalized Linear Function Approximation

arXiv:1912.04136v1146 citations
Originality Highly original
AI Analysis

This addresses a fundamental challenge in reinforcement learning for scenarios requiring more flexible function approximations, representing a significant advance rather than an incremental improvement.

The paper tackles the problem of episodic reinforcement learning with generalized linear function approximation by designing a new algorithm that achieves a regret bound of $ ilde{O}(\sqrt{d^3 T})$, which is the first statistically and computationally efficient solution in this setting.

We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation. We analyze the algorithm under a new expressivity assumption that we call "optimistic closure," which is strictly weaker than assumptions from prior analyses for the linear setting. With optimistic closure, we prove that our algorithm enjoys a regret bound of $\tilde{O}(\sqrt{d^3 T})$ where $d$ is the dimensionality of the state-action features and $T$ is the number of episodes. This is the first statistically and computationally efficient algorithm for reinforcement learning with generalized linear functions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes