LGMay 20, 2023

Non-stationary Delayed Online Convex Optimization: From Full-information to Bandit Setting

arXiv:2305.12131v48 citations
Originality Incremental advance
AI Analysis

This work addresses delayed feedback in dynamic environments for online learning systems, offering incremental improvements over prior stationary-focused methods.

The paper tackles non-stationary delayed online convex optimization by proposing algorithms for full-information and bandit settings, achieving dynamic regret bounds of O(√(dT(P_T+1))) with proven optimality and matching lower bounds.

Although online convex optimization (OCO) under arbitrary delays has received increasing attention recently, previous studies focus on stationary environments with the goal of minimizing static regret. In this paper, we investigate the delayed OCO in non-stationary environments, and choose dynamic regret with respect to any sequence of comparators as the performance metric. To this end, we first propose an algorithm called Mild-OGD for the full-information case, where delayed gradients are available. The basic idea is to maintain multiple experts in parallel, each performing a gradient descent step with different learning rates for every delayed gradient according to their arrival order, and utilize a meta-algorithm to track the best one based on their delayed performance. Despite the simplicity of this idea, our novel analysis shows that the dynamic regret of Mild-OGD can be automatically bounded by $O(\sqrt{\bar{d}T(P_T+1)})$ under the in-order assumption and $O(\sqrt{dT(P_T+1)})$ in the worst case, where $\bar{d}$ and $d$ denote the average and maximum delay respectively, $T$ is the time horizon, and $P_T$ is the path-length of comparators. Moreover, we demonstrate that the result in the worst case is optimal by deriving a matching lower bound. Finally, we develop a bandit variant of Mild-OGD for a more challenging case with only delayed loss values. Interestingly, we prove that under a relatively large amount of delay, our bandit algorithm even enjoys the best dynamic regret bound of existing non-delayed bandit algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes