ML LGFeb 23

Gap-Dependent Bounds for Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

Haochen Zhang, Zhong Zheng, Lingzhou Xue

arXiv:2602.20297v11.7h-index: 21

Originality Incremental advance

AI Analysis

This work addresses a theoretical gap in reinforcement learning for researchers, providing incremental improvements in analysis for algorithms with linear function approximation.

The paper tackles the problem of establishing gap-dependent regret bounds for nearly minimax-optimal reinforcement learning algorithms with linear function approximation, specifically for LSVI-UCB++, resulting in improved dependencies on feature dimension and horizon length, and introduces a concurrent variant for multi-agent settings with linear speedup.

We study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing analyses do not apply to algorithms that achieve the nearly minimax-optimal worst-case regret bound $\tilde{O}(d\sqrt{H^3K})$, where $d$ is the feature dimension, $H$ is the horizon length, and $K$ is the number of episodes. We bridge this gap by providing the first gap-dependent regret bound for the nearly minimax-optimal algorithm LSVI-UCB++ (He et al., 2023). Our analysis yields improved dependencies on both $d$ and $H$ compared to previous gap-dependent results. Moreover, leveraging the low policy-switching property of LSVI-UCB++, we introduce a concurrent variant that enables efficient parallel exploration across multiple agents and establish the first gap-dependent sample complexity upper bound for online multi-agent RL with linear function approximation, achieving linear speedup with respect to the number of agents.

View on arXiv PDF

Similar