LG MLNov 1, 2025

Stochastic Shortest Path with Sparse Adversarial Costs

Emmeran Johnson, Alberto Rumi, Ciara Pike-Burke, Patrick Rebeschini

arXiv:2511.00637v14.1h-index: 14

Originality Highly original

AI Analysis

This work addresses a theoretical limitation in online learning for sparse adversarial environments, offering an optimal method for known transitions but showing limited benefits in unknown transitions.

The paper tackles the adversarial Stochastic Shortest Path problem with sparse costs, showing that existing methods based on negative-entropy regularization are non-adaptive to sparsity and achieve regret scaling with √log S, while proposing ℓ_r-norm regularizers that adapt to sparsity and achieve optimal regret scaling with √log M, where M is the number of costly state-action pairs.

We study the adversarial Stochastic Shortest Path (SSP) problem with sparse costs under full-information feedback. In the known transition setting, existing bounds based on Online Mirror Descent (OMD) with negative-entropy regularization scale with $\sqrt{\log S A}$, where $SA$ is the size of the state-action space. While we show that this is optimal in the worst-case, this bound fails to capture the benefits of sparsity when only a small number $M \ll SA$ of state-action pairs incur cost. In fact, we also show that the negative-entropy is inherently non-adaptive to sparsity: it provably incurs regret scaling with $\sqrt{\log S}$ on sparse problems. Instead, we propose a family of $\ell_r$-norm regularizers ($r \in (1,2)$) that adapts to the sparsity and achieves regret scaling with $\sqrt{\log M}$ instead of $\sqrt{\log SA}$. We show this is optimal via a matching lower bound, highlighting that $M$ captures the effective dimension of the problem instead of $SA$. Finally, in the unknown transition setting the benefits of sparsity are limited: we prove that even on sparse problems, the minimax regret for any learner scales polynomially with $SA$.

View on arXiv PDF

Similar