LG AI MLMay 26, 2022

Variance-Aware Sparse Linear Bandits

Tsinghua

arXiv:2205.13450v311.813 citationsh-index: 49

Originality Incremental advance

AI Analysis

This work addresses the challenge of adapting to noise variance in sparse linear bandits, which is incremental as it builds on prior variance-aware methods but applies them to the sparse setting.

The paper tackles the problem of sparse linear bandits by introducing a variance-aware regret bound that interpolates between worst-case and deterministic regimes, achieving a regret of ̃O(√d∑σ_t^2 + 1) with concrete theoretical guarantees. It develops a black-box framework to convert variance-aware linear bandit algorithms into sparse versions, demonstrating the bounds using two existing algorithms.

It is well-known that for sparse linear bandits, when ignoring the dependency on sparsity which is much smaller than the ambient dimension, the worst-case minimax regret is $\widetildeΘ\left(\sqrt{dT}\right)$ where $d$ is the ambient dimension and $T$ is the number of rounds. On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve $\widetilde{\mathcal O}(1)$ regret, which is (nearly) independent of $d$ and $T$. In this paper, we present the first variance-aware regret guarantee for sparse linear bandits: $\widetilde{\mathcal O}\left(\sqrt{d\sum_{t=1}^T σ_t^2} + 1\right)$, where $σ_t^2$ is the variance of the noise at the $t$-th round. This bound naturally interpolates the regret bounds for the worst-case constant-variance regime (i.e., $σ_t \equiv Ω(1)$) and the benign deterministic regimes (i.e., $σ_t \equiv 0$). To achieve this variance-aware regret guarantee, we develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits in a "black-box" manner. Specifically, we take two recent algorithms as black boxes to illustrate that the claimed bounds indeed hold, where the first algorithm can handle unknown-variance cases and the second one is more efficient.

View on arXiv PDF

Similar