LG MLMar 23, 2021

Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions

arXiv:2103.12487v211.918 citations

Originality Synthesis-oriented

AI Analysis

This work provides incremental improvements to theoretical analysis for bandit algorithms, benefiting researchers in online learning and decision-making under uncertainty.

The paper tackles the problem of improving regret bounds for the Tsallis-INF algorithm in bandit settings, deriving tighter bounds for stochastically constrained adversarial bandits and stochastic bandits with adversarial corruptions, with specific expressions involving parameters like time horizon, number of arms, suboptimality gaps, and corruption magnitude.

We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021). We show that in adversarial regimes with a $(Δ,C,T)$ self-bounding constraint the algorithm achieves $\mathcal{O}\left(\left(\sum_{i\neq i^*} \frac{1}{Δ_i}\right)\log_+\left(\frac{(K-1)T}{\left(\sum_{i\neq i^*} \frac{1}{Δ_i}\right)^2}\right)+\sqrt{C\left(\sum_{i\neq i^*}\frac{1}{Δ_i}\right)\log_+\left(\frac{(K-1)T}{C\sum_{i\neq i^*}\frac{1}{Δ_i}}\right)}\right)$ regret bound, where $T$ is the time horizon, $K$ is the number of arms, $Δ_i$ are the suboptimality gaps, $i^*$ is the best arm, $C$ is the corruption magnitude, and $\log_+(x) = \max\left(1,\log x\right)$. The regime includes stochastic bandits, stochastically constrained adversarial bandits, and stochastic bandits with adversarial corruptions as special cases. Additionally, we provide a general analysis, which allows to achieve the same kind of improvement for generalizations of Tsallis-INF to other settings beyond multiarmed bandits.

View on arXiv PDF

Similar