Online Learning with Switching Costs and Other Adaptive Adversaries
This work addresses the challenge of adaptive adversaries in online learning for machine learning and optimization researchers, providing new lower bounds and characterizations, but it is incremental as it builds on existing regret frameworks.
The paper tackles the problem of online learning with adaptive adversaries, showing that with switching costs and bandit feedback, the attainable regret rate is Θ̃(T^{2/3}), which is worse than the Θ(√T) rate in full-information cases, and proves that bounded memory adversaries can force similar regret even with full information.
We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player's performance using a new notion of regret, also known as policy regret, which better captures the adversary's adaptiveness to the player's behavior. In a setting where losses are allowed to drift, we characterize ---in a nearly complete manner--- the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switching costs, the attainable rate with bandit feedback is $\widetildeΘ(T^{2/3})$. Interestingly, this rate is significantly worse than the $Θ(\sqrt{T})$ rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also show that a bounded memory adversary can force $\widetildeΘ(T^{2/3})$ regret even in the full information case, proving that switching costs are easier to control than bounded memory adversaries. Our lower bounds rely on a new stochastic adversary strategy that generates loss processes with strong dependencies.