Eleonora Fidelia Chiefari

h-index17
2papers

2 Papers

58.6GTMay 11
Online Resource Allocation With General Constraints

Eleonora Fidelia Chiefari, Francesco Emanuele Stradi, Matteo Castiglioni et al.

Online resource allocation (ORA) is a fundamental framework for sequential decision-making problems under budget constraints, with applications ranging from online advertising to revenue management. In this work, we study a broader setting that includes both budget constraints and general constraints, extending the classical budget-only model. This extension is essential for modeling critical economic requirements, such as Return-on-Investment (ROI) constraints. We develop an algorithm that achieves best-of-both-world guarantees within this generalized framework. In particular, against a dynamic benchmark, our algorithm achieves $\widetilde{\mathcal O}(\sqrt{T})$ regret in the \emph{stochastic} regime and $α$-regret of order $\widetilde{\mathcal O}(\sqrt{T})$ in the \emph{adversarial} regime, where $α$ depends on the feasibility margin of the corresponding offline problem. At the same time, our algorithm guarantees strict satisfaction of the budget constraints and $\widetilde{\mathcal O}(\sqrt{T})$ cumulative violation for the general ones. From a technical perspective, introducing general constraints alongside budgets precludes the use of standard budget-focus methods. While budget methods rely on a zero-consumption ``safe'' action to ensure feasibility, general constraints are much less ``aligned'' towards feasibility. We overcome these difficulties with a new analysis that exploits \emph{weak adaptivity} to get boundedness of the Lagrangian multipliers and best-of-both-world guarantees.

LGSep 24, 2025
Beyond Slater's Condition in Online CMDPs with Stochastic and Adversarial Constraints

Francesco Emanuele Stradi, Eleonora Fidelia Chiefari, Matteo Castiglioni et al.

We study \emph{online episodic Constrained Markov Decision Processes} (CMDPs) under both stochastic and adversarial constraints. We provide a novel algorithm whose guarantees greatly improve those of the state-of-the-art best-of-both-worlds algorithm introduced by Stradi et al. (2025). In the stochastic regime, \emph{i.e.}, when the constraints are sampled from fixed but unknown distributions, our method achieves $\widetilde{\mathcal{O}}(\sqrt{T})$ regret and constraint violation without relying on Slater's condition, thereby handling settings where no strictly feasible solution exists. Moreover, we provide guarantees on the stronger notion of \emph{positive} constraint violation, which does not allow to recover from large violation in the early episodes by playing strictly safe policies. In the adversarial regime, \emph{i.e.}, when the constraints may change arbitrarily between episodes, our algorithm ensures sublinear constraint violation without Slater's condition, and achieves sublinear $α$-regret with respect to the \emph{unconstrained} optimum, where $α$ is a suitably defined multiplicative approximation factor. We further validate our results through synthetic experiments, showing the practical effectiveness of our algorithm.