LGSep 24, 2025

Beyond Slater's Condition in Online CMDPs with Stochastic and Adversarial Constraints

arXiv:2509.20114v2h-index: 17
Originality Highly original
AI Analysis

This work addresses the problem of online decision-making under constraints for researchers in reinforcement learning and optimization, offering significant improvements over prior methods but is incremental in advancing best-of-both-worlds algorithms.

The paper tackles online constrained Markov decision processes with stochastic and adversarial constraints by introducing a novel algorithm that improves state-of-the-art guarantees, achieving O~(√T) regret and constraint violation without Slater's condition and handling positive constraint violation in stochastic settings, while ensuring sublinear violation and α-regret in adversarial regimes.

We study \emph{online episodic Constrained Markov Decision Processes} (CMDPs) under both stochastic and adversarial constraints. We provide a novel algorithm whose guarantees greatly improve those of the state-of-the-art best-of-both-worlds algorithm introduced by Stradi et al. (2025). In the stochastic regime, \emph{i.e.}, when the constraints are sampled from fixed but unknown distributions, our method achieves $\widetilde{\mathcal{O}}(\sqrt{T})$ regret and constraint violation without relying on Slater's condition, thereby handling settings where no strictly feasible solution exists. Moreover, we provide guarantees on the stronger notion of \emph{positive} constraint violation, which does not allow to recover from large violation in the early episodes by playing strictly safe policies. In the adversarial regime, \emph{i.e.}, when the constraints may change arbitrarily between episodes, our algorithm ensures sublinear constraint violation without Slater's condition, and achieves sublinear $α$-regret with respect to the \emph{unconstrained} optimum, where $α$ is a suitably defined multiplicative approximation factor. We further validate our results through synthetic experiments, showing the practical effectiveness of our algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes