Self-Concordant Perturbations for Linear Bandits
This work addresses the adversarial linear bandits problem for researchers in online learning and bandit optimization, offering incremental improvements in regret bounds for specific domains.
The paper tackles the adversarial linear bandits problem by introducing a novel FTPL-based algorithm using self-concordant perturbations, achieving a regret of O(d√(n ln n)) on both the d-dimensional hypercube and Euclidean ball, with a √d improvement over existing methods on the hypercube.
We study the adversarial linear bandits problem and present a unified algorithmic framework that bridges Follow-the-Regularized-Leader (FTRL) and Follow-the-Perturbed-Leader (FTPL) methods, extending the known connection between them from the full-information setting. Within this framework, we introduce self-concordant perturbations, a family of probability distributions that mirror the role of self-concordant barriers previously employed in the FTRL-based SCRiBLe algorithm. Using this idea, we design a novel FTPL-based algorithm that combines self-concordant regularization with efficient stochastic exploration. Our approach achieves a regret of $O(d\sqrt{n \ln n})$ on both the $d$-dimensional hypercube and the Euclidean ball. On the Euclidean ball, this matches the rate attained by existing self-concordant FTRL methods. For the hypercube, this represents a $\sqrt{d}$ improvement over these methods and matches the optimal bound up to logarithmic factors.