Online Convex Optimization with Stochastic Constraints: Zero Constraint Violation and Bandit Feedback
This addresses the problem of constraint violation in online optimization for applications like resource allocation, offering a zero-violation guarantee that is incremental over prior work.
The paper tackles online convex optimization with stochastic constraints by proposing a variant of the drift-plus-penalty algorithm that guarantees O(√T) expected regret and zero constraint violation after a fixed number of iterations, improving upon the vanilla method with O(√T) constraint violation, and extends this to bandit feedback settings with similar results.
This paper studies online convex optimization with stochastic constraints. We propose a variant of the drift-plus-penalty algorithm that guarantees $O(\sqrt{T})$ expected regret and zero constraint violation, after a fixed number of iterations, which improves the vanilla drift-plus-penalty method with $O(\sqrt{T})$ constraint violation. Our algorithm is oblivious to the length of the time horizon $T$, in contrast to the vanilla drift-plus-penalty method. This is based on our novel drift lemma that provides time-varying bounds on the virtual queue drift and, as a result, leads to time-varying bounds on the expected virtual queue length. Moreover, we extend our framework to stochastic-constrained online convex optimization under two-point bandit feedback. We show that by adapting our algorithmic framework to the bandit feedback setting, we may still achieve $O(\sqrt{T})$ expected regret and zero constraint violation, improving upon the previous work for the case of identical constraint functions. Numerical results demonstrate our theoretical results.