Optimal Algorithms for Online Convex Optimization with Adversarial Constraints
This solves a foundational problem in online learning with constraints, enabling more robust and efficient algorithms for applications like resource allocation and control systems.
The paper tackles the problem of constrained online convex optimization (COCO) by designing an online learning policy that simultaneously achieves O(√T) regret and Õ(√T) cumulative constraint violation against an adaptive adversary, answering a long-standing open question, and improves regret to O(log T) for strongly convex costs.
A well-studied generalization of the standard online convex optimization (OCO) framework is constrained online convex optimization (COCO). In COCO, on every round, a convex cost function and a convex constraint function are revealed to the learner after it chooses the action for that round. The objective is to design an online learning policy that simultaneously achieves a small regret while ensuring a small cumulative constraint violation (CCV) against an adaptive adversary interacting over a horizon of length $T$. A long-standing open question in COCO is whether an online policy can simultaneously achieve $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV without any restrictive assumptions. For the first time, we answer this in the affirmative and show that a simple first-order policy can simultaneously achieve these bounds. Furthermore, in the case of strongly convex cost and convex constraint functions, the regret guarantee can be improved to $O(\log T)$ while keeping the CCV bound the same as above. We establish these results by effectively combining adaptive OCO policies as a blackbox with Lyapunov optimization - a classic tool from control theory. Surprisingly, the analysis is short and elegant.