LG OC MLMay 10, 2025

Beyond $\tilde{O}(\sqrt{T})$ Constraint Violation for Online Convex Optimization with Adversarial Constraints

arXiv:2505.06709v211.43 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work is significant for safety-critical applications that require strict adherence to constraints, as it provides a method to substantially reduce cumulative constraint violation at the cost of increased regret. This is an incremental improvement to existing online convex optimization algorithms.

This paper addresses Online Convex Optimization with adversarial constraints (COCO), where the goal is to minimize both regret and cumulative constraint violation (CCV). The authors propose a new online policy that achieves a trade-off between regret and CCV, specifically $\tilde{O}(\sqrt{dT}+ T^\beta)$ regret and $\tilde{O}(dT^{1-\beta})$ CCV, improving upon the previous $\tilde{O}(\sqrt{T})$ CCV for any bounded convex cost and constraint functions.

We study Online Convex Optimization with adversarial constraints (COCO). At each round a learner selects an action from a convex decision set and then an adversary reveals a convex cost and a convex constraint function. The goal of the learner is to select a sequence of actions to minimize both regret and the cumulative constraint violation (CCV) over a horizon of length $T$. The best-known policy for this problem achieves $O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. In this paper, we improve this by trading off regret to achieve substantially smaller CCV. This trade-off is especially important in safety-critical applications, where satisfying the safety constraints is non-negotiable. Specifically, for any bounded convex cost and constraint functions, we propose an online policy that achieves $\tilde{O}(\sqrt{dT}+ T^β)$ regret and $\tilde{O}(dT^{1-β})$ CCV, where $d$ is the dimension of the decision set and $β\in [0,1]$ is a tunable parameter. We begin with a special case, called the $\textsf{Constrained Expert}$ problem, where the decision set is a probability simplex and the cost and constraint functions are linear. Leveraging a new adaptive small-loss regret bound, we propose a computationally efficient policy for the $\textsf{Constrained Expert}$ problem, that attains $O(\sqrt{T\ln N}+T^β)$ regret and $\tilde{O}(T^{1-β} \ln N)$ CCV for $N$ number of experts. The original problem is then reduced to the $\textsf{Constrained Expert}$ problem via a covering argument. Finally, with an additional $M$-smoothness assumption, we propose a computationally efficient first-order policy attaining $O(\sqrt{MT}+T^β)$ regret and $\tilde{O}(MT^{1-β})$ CCV.

View on arXiv PDF

Similar