LGMay 20, 2023

Non-stationary Delayed Online Convex Optimization: From Full-information to Bandit Setting

Yuanyu Wan, Chang Yao, Yitao Ma, Mingli Song, Lijun Zhang

arXiv:2305.12131v46.68 citations

Originality Incremental advance

AI Analysis

This work addresses delayed feedback in dynamic environments for online learning systems, offering incremental improvements over prior stationary-focused methods.

The paper tackles non-stationary delayed online convex optimization by proposing algorithms for full-information and bandit settings, achieving dynamic regret bounds of O(√(dT(P_T+1))) with proven optimality and matching lower bounds.

Although online convex optimization (OCO) under arbitrary delays has received increasing attention recently, previous studies focus on stationary environments with the goal of minimizing static regret. In this paper, we investigate the delayed OCO in non-stationary environments, and choose dynamic regret with respect to any sequence of comparators as the performance metric. To this end, we first propose an algorithm called Mild-OGD for the full-information case, where delayed gradients are available. The basic idea is to maintain multiple experts in parallel, each performing a gradient descent step with different learning rates for every delayed gradient according to their arrival order, and utilize a meta-algorithm to track the best one based on their delayed performance. Despite the simplicity of this idea, our novel analysis shows that the dynamic regret of Mild-OGD can be automatically bounded by $O(\sqrt{\bar{d}T(P_T+1)})$ under the in-order assumption and $O(\sqrt{dT(P_T+1)})$ in the worst case, where $\bar{d}$ and $d$ denote the average and maximum delay respectively, $T$ is the time horizon, and $P_T$ is the path-length of comparators. Moreover, we demonstrate that the result in the worst case is optimal by deriving a matching lower bound. Finally, we develop a bandit variant of Mild-OGD for a more challenging case with only delayed loss values. Interestingly, we prove that under a relatively large amount of delay, our bandit algorithm even enjoys the best dynamic regret bound of existing non-delayed bandit algorithms.

View on arXiv PDF

Similar