MLLGMay 28

Instance-dependent Stochastic Lipschitz bandit

arXiv:2605.2974856.8
Predicted impact top 20% in ML · last 90 daysOriginality Incremental advance
AI Analysis

Provides a more refined instance-dependent regret analysis for Lipschitz bandits, benefiting researchers in online learning and optimization.

The paper introduces a new algorithm and analysis for Lipschitz bandits that achieves regret bounds adapting to local growth of level sets, improving over classical zooming-based bounds. For maximizers of dimension d*>0, they obtain rates of order O~(T^{(d_z+1)/max(d_z,d*)+2}), strictly improving existing results.

We study the Lipschitz bandit problem, where a learner sequentially maximizes an unknown Lipschitz function $f$ over a domain $\mathcal{X} \subset [0,1]^d$ using noisy pointwise evaluations. Existing regret bounds are either worst-case, scaling as $\tildeΘ \left ( T^{d+1/d+2}\right )$, or adaptive via the zooming dimension $d_z$, yielding $\tildeΘ \left ( T^{d_z+1/d_z+2}\right )$. However, such zooming-based guarantees are only partially instance-dependent, as they depend solely on the asymptotic growth of near-optimal level sets and fail to capture finer structural properties of $f$. We provide an analysis and an algorithm that characterizes the regret through integrals of the suboptimality gap of $f$ over its level sets. This yields regret bounds that adapt to the local growth of level sets, rather than only their asymptotic behavior. As a corollary, when the set of maximizers has dimension $d^\star>0$, we obtain improved adaptive rates of order $\tilde{\mathcal{O}} \left ( T^{d_z+1 / \max(d_z,d^\star)+2}\right )$ strictly improving over classical zooming bounds in this regime. Finally, we extend our analysis to the full-information setting (Lipschitz experts) and show how some of the regularity assumptions can be relaxed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes