Adversarial Bandits against Arbitrary Strategies
This work addresses a theoretical challenge in online learning for adversarial bandits, offering incremental improvements in regret bounds for scenarios with switching strategies.
The paper tackles the adversarial bandit problem with an unknown parameter S for switches in the best arm, achieving regret bounds of O~(S^{1/2}K^{1/3}T^{2/3}) and improved O~(min{√(SKTρ), S√(KT)}) using adaptive learning rates.
We study the adversarial bandit problem against arbitrary strategies, where the difficulty is captured by an unknown parameter $S$, which is the number of switches in the best arm in hindsight. To handle this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with simple OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3})$, in which $T^{2/3}$ comes from the variance of loss estimators. To mitigate the impact of the variance, we propose using adaptive learning rates for OMD and achieve $\tilde{O}(\min\{\sqrt{SKTρ},S\sqrt{KT}\})$, where $ρ$ is a variance term for loss estimators.