ML LGSep 22, 2021

On Optimal Robustness to Adversarial Corruption in Online Decision Problems

arXiv:2109.10963v113.228 citations

Originality Incremental advance

AI Analysis

This work addresses robustness in sequential decision-making for machine learning and optimization, providing tight theoretical guarantees against corruption, but it is incremental as it builds on existing frameworks.

This paper tackles the problem of achieving optimal robustness against adversarial corruption in online decision-making, specifically in prediction with expert advice and multi-armed bandit settings, showing that algorithms achieve a regret bound of O((log N)/Δ + sqrt((C log N)/Δ)) and proving it is tight with matching lower bounds.

This paper considers two fundamental sequential decision-making problems: the problem of prediction with expert advice and the multi-armed bandit problem. We focus on stochastic regimes in which an adversary may corrupt losses, and we investigate what level of robustness can be achieved against adversarial corruptions. The main contribution of this paper is to show that optimal robustness can be expressed by a square-root dependency on the amount of corruption. More precisely, we show that two classes of algorithms, anytime Hedge with decreasing learning rate and algorithms with second-order regret bounds, achieve $O( \frac{\log N}Δ + \sqrt{ \frac{C \log N }Δ } )$-regret, where $N, Δ$, and $C$ represent the number of experts, the gap parameter, and the corruption level, respectively. We further provide a matching lower bound, which means that this regret bound is tight up to a constant factor. For the multi-armed bandit problem, we also provide a nearly tight lower bound up to a logarithmic factor.

View on arXiv PDF

Similar