LGMLOct 4, 2022

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

arXiv:2210.01376v25 citationsh-index: 64
Originality Incremental advance
AI Analysis

This work addresses a fundamental challenge in online learning with partial feedback, offering more efficient algorithms for applications like contextual bandits, though it is incremental in refining existing frameworks.

The paper tackles the problem of achieving optimal high-probability regret bounds for adversarial bandits with time-varying feedback graphs, developing algorithms that remove poly(K) dependencies and improve upon existing bounds for both strongly and weakly observable graphs.

We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. For general strongly observable graphs, we develop an algorithm that achieves the optimal regret $\widetilde{\mathcal{O}}((\sum_{t=1}^Tα_t)^{1/2}+\max_{t\in[T]}α_t)$ with high probability, where $α_t$ is the independence number of the feedback graph at round $t$. Compared to the best existing result [Neu, 2015] which only considers graphs with self-loops for all nodes, our result not only holds more generally, but importantly also removes any $\text{poly}(K)$ dependence that can be prohibitively large for applications such as contextual bandits. Furthermore, we also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs, which even improves the best expected regret bound of [Alon et al., 2015] by removing the $\mathcal{O}(\sqrt{KT})$ term with a refined analysis. Our algorithms are based on the online mirror descent framework, but importantly with an innovative combination of several techniques. Notably, while earlier works use optimistic biased loss estimators for achieving high-probability bounds, we find it important to use a pessimistic one for nodes without self-loop in a strongly observable graph.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes