Optimal and Practical Batched Linear Bandit Algorithm
This solves the problem of balancing theoretical optimality and practical efficiency in batched linear bandits for researchers and practitioners in reinforcement learning.
The paper tackles the batched linear bandit problem by proposing BLAE, a novel algorithm that integrates arm elimination with regularized G-optimal design, achieving minimax optimal regret (up to logarithmic factors) in all regimes with only O(log log T) batches and demonstrating strong empirical performance.
We study the linear bandit problem under limited adaptivity, known as the batched linear bandit. While existing approaches can achieve near-optimal regret in theory, they are often computationally prohibitive or underperform in practice. We propose BLAE, a novel batched algorithm that integrates arm elimination with regularized G-optimal design, achieving the minimax optimal regret (up to logarithmic factors in $T$) in both large-$K$ and small-$K$ regimes for the first time, while using only $O(\log\log T)$ batches. Our analysis introduces new techniques for batch-wise optimal design and refined concentration bounds. Crucially, BLAE demonstrates low computational overhead and strong empirical performance, outperforming state-of-the-art methods in extensive numerical evaluations. Thus, BLAE is the first algorithm to combine provable minimax-optimality in all regimes and practical superiority in batched linear bandits.