LGMLSep 30, 2025

Lipschitz Bandits with Stochastic Delayed Feedback

arXiv:2510.00309v1h-index: 2
Originality Incremental advance
AI Analysis

This work addresses a practical challenge in online learning for scenarios with delayed feedback, such as recommendation systems, by extending Lipschitz bandits to handle stochastic delays, though it is incremental as it builds on existing bandit frameworks.

The paper tackles the Lipschitz bandit problem with stochastic delayed feedback, where rewards are observed after random delays, and proposes algorithms for bounded and unbounded delays that achieve sublinear regret, with bounded delays retaining near-optimal performance and unbounded delays being nearly optimal up to logarithmic factors.

The Lipschitz bandit problem extends stochastic bandits to a continuous action set defined over a metric space, where the expected reward function satisfies a Lipschitz condition. In this work, we introduce a new problem of Lipschitz bandit in the presence of stochastic delayed feedback, where the rewards are not observed immediately but after a random delay. We consider both bounded and unbounded stochastic delays, and design algorithms that attain sublinear regret guarantees in each setting. For bounded delays, we propose a delay-aware zooming algorithm that retains the optimal performance of the delay-free setting up to an additional term that scales with the maximal delay $τ_{\max}$. For unbounded delays, we propose a novel phased learning strategy that accumulates reliable feedback over carefully scheduled intervals, and establish a regret lower bound showing that our method is nearly optimal up to logarithmic factors. Finally, we present experimental results to demonstrate the efficiency of our algorithms under various delay scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes