Heavy-Tailed Linear Bandits: Huber Regression with One-Pass Update
This work addresses computational inefficiency in heavy-tailed bandit algorithms for researchers and practitioners in online learning, though it is incremental as it builds on prior adaptive Huber regression methods.
The paper tackles the problem of stochastic linear bandits with heavy-tailed noise by proposing a one-pass algorithm that reduces per-round computational cost from O(t log T) to O(1) and achieves a near-optimal, variance-aware regret bound of order ~O(d T^{(1-ε)/(2(1+ε))} √(∑ ν_t^2) + d T^{(1-ε)/(2(1+ε))}).
We study the stochastic linear bandits with heavy-tailed noise. Two principled strategies for handling heavy-tailed noise, truncation and median-of-means, have been introduced to heavy-tailed bandits. Nonetheless, these methods rely on specific noise assumptions or bandit structures, limiting their applicability to general settings. The recent work [Huang et al.2024] develops a soft truncation method via the adaptive Huber regression to address these limitations. However, their method suffers undesired computational costs: it requires storing all historical data and performing a full pass over these data at each round. In this paper, we propose a \emph{one-pass} algorithm based on the online mirror descent framework. Our method updates using only current data at each round, reducing the per-round computational cost from $\mathcal{O}(t \log T)$ to $\mathcal{O}(1)$ with respect to current round $t$ and the time horizon $T$, and achieves a near-optimal and variance-aware regret of order $\widetilde{\mathcal{O}}\big(d T^{\frac{1-ε}{2(1+ε)}} \sqrt{\sum_{t=1}^T ν_t^2} + d T^{\frac{1-ε}{2(1+ε)}}\big)$ where $d$ is the dimension and $ν_t^{1+ε}$ is the $(1+ε)$-th central moment of reward at round $t$.