Optimal Bounds for Adversarial Constrained Online Convex Optimization
This provides an optimal solution for constrained online learning problems, which is incremental as it refines existing bounds in the field.
The paper tackles the problem of achieving optimal bounds for both regret and cumulative constraint violation in constrained online convex optimization against an adaptive adversary, showing that it is possible to obtain O(√T) bounds for both, improving prior results.
Constrained Online Convex Optimization (COCO) can be seen as a generalization of the standard Online Convex Optimization (OCO) framework. At each round, a cost function and constraint function are revealed after a learner chooses an action. The goal is to minimize both the regret and cumulative constraint violation (CCV) against an adaptive adversary. We show for the first time that is possible to obtain the optimal $O(\sqrt{T})$ bound on both regret and CCV, improving the best known bounds of $O \left( \sqrt{T} \right)$ and $\tilde{O} \left( \sqrt{T} \right)$ for the regret and CCV, respectively. Based on a new surrogate loss function enforcing a minimum penalty on the constraint function, we demonstrate that both the Follow-the-Regularized-Leader and the Online Gradient Descent achieve the optimal bounds.