Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs
This work addresses the problem of optimizing regret in online learning with feedback graphs for researchers and practitioners, representing a significant but incremental advance in algorithm design.
The study tackles online learning with directed feedback graphs by developing best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments and poly-logarithmic regret for stochastic environments, resolving an open question from prior work.
This study considers online learning with general directed feedback graphs. For this problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments as well as poly-logarithmic regret bounds for stochastic environments. As Alon et al. [2015] have shown, tight regret bounds depend on the structure of the feedback graph: strongly observable graphs yield minimax regret of $\tildeΘ( α^{1/2} T^{1/2} )$, while weakly observable graphs induce minimax regret of $\tildeΘ( δ^{1/3} T^{2/3} )$, where $α$ and $δ$, respectively, represent the independence number of the graph and the domination number of a certain portion of the graph. Our proposed algorithm for strongly observable graphs has a regret bound of $\tilde{O}( α^{1/2} T^{1/2} ) $ for adversarial environments, as well as of $ {O} ( \frac{α(\ln T)^3 }{Δ_{\min}} ) $ for stochastic environments, where $Δ_{\min}$ expresses the minimum suboptimality gap. This result resolves an open question raised by Erez and Koren [2021]. We also provide an algorithm for weakly observable graphs that achieves a regret bound of $\tilde{O}( δ^{1/3}T^{2/3} )$ for adversarial environments and poly-logarithmic regret for stochastic environments. The proposed algorithms are based on the follow-the-regularized-leader approach combined with newly designed update rules for learning rates.