Improved Regret of Linear Ensemble Sampling
This work advances the theoretical foundation of ensemble sampling in bandit algorithms, making it competitive with other randomized methods.
The paper tackles the theoretical gap in linear ensemble sampling by proving it can achieve a frequentist regret bound of Õ(d^{3/2}√T) with ensemble size logarithmic in T, matching state-of-the-art randomized linear bandit algorithms. It also shows this framework applies to Linear Perturbed-History Exploration (LinPHE), deriving the same bound independent of the number of arms.
In this work, we close the fundamental gap of theory and practice by providing an improved regret bound for linear ensemble sampling. We prove that with an ensemble size logarithmic in $T$, linear ensemble sampling can achieve a frequentist regret bound of $\tilde{O}(d^{3/2}\sqrt{T})$, matching state-of-the-art results for randomized linear bandit algorithms, where $d$ and $T$ are the dimension of the parameter and the time horizon respectively. Our approach introduces a general regret analysis framework for linear bandit algorithms. Additionally, we reveal a significant relationship between linear ensemble sampling and Linear Perturbed-History Exploration (LinPHE), showing that LinPHE is a special case of linear ensemble sampling when the ensemble size equals $T$. This insight allows our analysis framework to derive a regret bound of $\tilde{O}(d^{3/2}\sqrt{T})$ for LinPHE, independent of the number of arms. Our techniques advance the theoretical foundation of ensemble sampling, bringing its regret bounds in line with the best known bounds for other randomized exploration algorithms.