ML LGJun 16, 2025

Experimental Design for Semiparametric Bandits

arXiv:2506.13390v27.81 citationsh-index: 2COLT

Originality Highly original

AI Analysis

This work addresses the problem of efficient learning in complex bandit problems for researchers and practitioners in the field of online learning and decision-making.

The authors tackled the problem of semiparametric bandits, achieving a minimax regret of $ ilde{O}(sqrt{dT})$ and logarithmic regret under certain conditions. Their approach simultaneously offers sharp regret bounds, PAC bounds, and best-arm identification guarantees.

We study finite-armed semiparametric bandits, where each arm's reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in practice. We propose the first experimental-design approach that simultaneously offers a sharp regret bound, a PAC bound, and a best-arm identification guarantee. Our method attains the minimax regret $\tilde{O}(\sqrt{dT})$, matching the known lower bound for finite-armed linear bandits, and further achieves logarithmic regret under a positive suboptimality gap condition. These guarantees follow from our refined non-asymptotic analysis of orthogonalized regression that attains the optimal $\sqrt{d}$ rate, paving the way for robust and efficient learning across a broad class of semiparametric bandit problems.

View on arXiv PDF

Similar