LGFeb 18, 2025

Alternating Regret for Online Convex Optimization

Soumita Hait, Ping Li, Haipeng Luo, Mengxiao Zhang

arXiv:2502.12529v29.43 citationsh-index: 69COLT

Originality Highly original

AI Analysis

This work advances online learning theory by resolving a key open problem for alternating dynamics in two-player games, with implications for equilibrium finding in convex games.

The paper tackles the open question of whether $o(\sqrt{T})$ alternating regret is achievable for adversarial Online Convex Optimization (OCO), answering affirmatively by showing the continuous Hedge algorithm achieves $\tilde{\mathcal{O}}(d^{\frac{2}{3}}T^{\frac{1}{3}})$ alternating regret, and proposes another algorithm that improves time complexity and dimension dependence, achieving bounds like $\tilde{\mathcal{O}}(T^{\frac{2}{5}})$ for specific cases.

Motivated by alternating learning dynamics in two-player games, a recent work by Cevher et al.(2024) shows that $o(\sqrt{T})$ alternating regret is possible for any $T$-round adversarial Online Linear Optimization (OLO) problem, and left as an open question whether the same is true for general Online Convex Optimization (OCO). We answer this question in the affirmative by showing that the continuous Hedge algorithm achieves $\tilde{\mathcal{O}}(d^{\frac{2}{3}}T^{\frac{1}{3}})$ alternating regret for any adversarial $d$-dimensional OCO problems. We show that this implies an alternating learning dynamic that finds a Nash equilibrium for any convex-concave zero-sum games or a coarse correlated equilibrium for any convex two-player general-sum games at a rate of $\tilde{\mathcal{O}}(d^{\frac{2}{3}}/T^{\frac{2}{3}})$. To further improve the time complexity and/or the dimension dependence, we propose another simple algorithm, Follow-the-Regularized-Leader with a regularizer whose convex conjugate is 3rd-order smooth, for OCO with smooth and self-concordant loss functions (such as linear or quadratic losses). We instantiate our algorithm with different regularizers and show that, for example, when the decision set is the $\ell_2$ ball, our algorithm achieves $\tilde{\mathcal{O}}(T^{\frac{2}{5}})$ alternating regret with no dimension dependence (and a better $\tilde{\mathcal{O}}(T^{\frac{1}{3}})$ bound for quadratic losses). We complement our results by showing some algorithm-specific alternating regret lower bounds, including a somewhat surprising $Ω(\sqrt{T})$ lower bound for a Regret Matching variant that is widely used in alternating learning dynamics.

View on arXiv PDF

Similar