MLLGSTOct 29, 2025

Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

arXiv:2510.25811v1h-index: 1Has Code
Originality Incremental advance
AI Analysis

This work addresses a specific challenge in bandit theory for researchers and practitioners dealing with multimodal reward structures, representing an incremental advance by making existing theoretical solutions practical.

The paper tackles the stochastic multi-armed bandit problem with multimodal expected reward functions, proposing the first computationally tractable algorithm for computing the Graves-Lai optimization solution, which enables asymptotically optimal algorithms for this problem.

We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available at https://github.com/wilrev/MultimodalBandits

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes