LG MLFeb 16

Replicable Constrained Bandits

Matteo Bollini, Gianmarco Genalti, Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi

arXiv:2602.14580v12.71 citationsh-index: 11

Originality Incremental advance

AI Analysis

This addresses the need for reproducible experiments in machine learning, particularly for constrained MAB problems, though it is incremental as it extends replicability to a new setting.

The paper tackles the problem of ensuring algorithmic replicability in constrained multi-armed bandits (MABs), where decisions must be reproducible across executions while maximizing reward and satisfying constraints, and shows that replicable algorithms can achieve regret and constraint violation bounds matching non-replicable ones in terms of T.

Algorithmic \emph{replicability} has recently been introduced to address the need for reproducible experiments in machine learning. A \emph{replicable online learning} algorithm is one that takes the same sequence of decisions across different executions in the same environment, with high probability. We initiate the study of algorithmic replicability in \emph{constrained} MAB problems, where a learner interacts with an unknown stochastic environment for $T$ rounds, seeking not only to maximize reward but also to satisfy multiple constraints. Our main result is that replicability can be achieved in constrained MABs. Specifically, we design replicable algorithms whose regret and constraint violation match those of non-replicable ones in terms of $T$. As a key step toward these guarantees, we develop the first replicable UCB-like algorithm for \emph{unconstrained} MABs, showing that algorithms that employ the optimism in-the-face-of-uncertainty principle can be replicable, a result that we believe is of independent interest.

View on arXiv PDF

Similar