LG MLDec 24, 2023

Best-of-Both-Worlds Algorithms for Linear Contextual Bandits

Yuko Kuroki, Alberto Rumi, Taira Tsuchiya, Fabio Vitale, Nicolò Cesa-Bianchi

arXiv:2312.15433v213.014 citationsh-index: 48AISTATS

Originality Incremental advance

AI Analysis

This provides robust solutions for sequential decision-making in uncertain environments, though it is incremental as it builds on existing best-of-both-worlds frameworks.

The paper tackles the problem of designing algorithms for linear contextual bandits that perform well in both adversarial and stochastic environments without prior knowledge, achieving polylogarithmic regret in the stochastic regime and near-optimal bounds like O(dK√L*) in the adversarial regime.

We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits. Our algorithms deliver near-optimal regret bounds in both the adversarial and stochastic regimes, without prior knowledge about the environment. In the stochastic regime, we achieve the polylogarithmic rate $\frac{(dK)^2\mathrm{poly}\log(dKT)}{Δ_{\min}}$, where $Δ_{\min}$ is the minimum suboptimality gap over the $d$-dimensional context space. In the adversarial regime, we obtain either the first-order $\widetilde{O}(dK\sqrt{L^*})$ bound, or the second-order $\widetilde{O}(dK\sqrt{Λ^*})$ bound, where $L^*$ is the cumulative loss of the best action and $Λ^*$ is a notion of the cumulative second moment for the losses incurred by the algorithm. Moreover, we develop an algorithm based on FTRL with Shannon entropy regularizer that does not require the knowledge of the inverse of the covariance matrix, and achieves a polylogarithmic regret in the stochastic regime while obtaining $\widetilde{O}\big(dK\sqrt{T}\big)$ regret bounds in the adversarial regime.

View on arXiv PDF

Similar