LG MLNov 1, 2023

Fixed-Budget Best-Arm Identification in Sparse Linear Bandits

arXiv:2311.00481v13.82 citationsh-index: 5

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficiently identifying the best arm in high-dimensional sparse linear bandits, which is crucial for applications like recommendation systems and clinical trials, though it is incremental as it builds on prior methods like thresholded Lasso and OD-LinBAI.

The paper tackles the best-arm identification problem in sparse linear bandits under a fixed budget, proposing a two-phase algorithm (Lasso-OD) that leverages sparsity to achieve an error probability exponent dependent on sparsity s rather than dimension d, and demonstrates near-minimax optimality and significant performance improvements over existing non-sparse methods in numerical examples.

We study the best-arm identification problem in sparse linear bandits under the fixed-budget setting. In sparse linear bandits, the unknown feature vector $θ^*$ may be of large dimension $d$, but only a few, say $s \ll d$ of these features have non-zero values. We design a two-phase algorithm, Lasso and Optimal-Design- (Lasso-OD) based linear best-arm identification. The first phase of Lasso-OD leverages the sparsity of the feature vector by applying the thresholded Lasso introduced by Zhou (2009), which estimates the support of $θ^*$ correctly with high probability using rewards from the selected arms and a judicious choice of the design matrix. The second phase of Lasso-OD applies the OD-LinBAI algorithm by Yang and Tan (2022) on that estimated support. We derive a non-asymptotic upper bound on the error probability of Lasso-OD by carefully choosing hyperparameters (such as Lasso's regularization parameter) and balancing the error probabilities of both phases. For fixed sparsity $s$ and budget $T$, the exponent in the error probability of Lasso-OD depends on $s$ but not on the dimension $d$, yielding a significant performance improvement for sparse and high-dimensional linear bandits. Furthermore, we show that Lasso-OD is almost minimax optimal in the exponent. Finally, we provide numerical examples to demonstrate the significant performance improvement over the existing algorithms for non-sparse linear bandits such as OD-LinBAI, BayesGap, Peace, LinearExploration, and GSE.

View on arXiv PDF

Similar