LGJan 14

Eluder dimension: localise it!

Alireza Bakhtiari, Alex Ayoub, Samuel Robertson, David Janz, Csaba Szepesvári

arXiv:2601.09825v11 citations

Originality Highly original

AI Analysis

This addresses a theoretical bottleneck in regret analysis for reinforcement learning and bandit algorithms, providing a novel approach to improve performance bounds.

The paper tackles the problem that standard eluder dimension analysis fails to achieve first-order regret bounds for generalised linear models, and introduces a localisation method that recovers and improves classic bandit results while enabling first-order bounds for finite-horizon reinforcement learning with bounded cumulative returns.

We establish a lower bound on the eluder dimension of generalised linear model classes, showing that standard eluder dimension-based analysis cannot lead to first-order regret bounds. To address this, we introduce a localisation method for the eluder dimension; our analysis immediately recovers and improves on classic results for Bernoulli bandits, and allows for the first genuine first-order bounds for finite-horizon reinforcement learning tasks with bounded cumulative returns.

View on arXiv PDF

Similar