LGOct 8, 2025

Best-of-Both Worlds for linear contextual bandits with paid observations

Nathan Boyer, Dorian Baudry, Patrick Rebeschini

arXiv:2510.07424v24.1h-index: 14

Originality Incremental advance

AI Analysis

This work addresses a specific challenge in bandit algorithms for scenarios with costly observations, offering a robust solution that adapts to different environments, though it is incremental as it builds on existing frameworks.

The paper tackles the problem of linear contextual bandits with paid observations, where a learner selects actions to minimize loss and can pay to observe arm losses, and introduces a Best-of-Both-Worlds algorithm that achieves minimax-optimal regret of Θ(T^{2/3}) in adversarial settings and poly-logarithmic regret in stochastic regimes.

We study the problem of linear contextual bandits with paid observations, where at each round the learner selects an action in order to minimize its loss in a given context, and can then decide to pay a fixed cost to observe the loss of any arm. Building on the Follow-the-Regularized-Leader framework with efficient estimators via Matrix Geometric Resampling, we introduce a computationally efficient Best-of-Both-Worlds (BOBW) algorithm for this problem. We show that it achieves the minimax-optimal regret of $Θ(T^{2/3})$ in adversarial settings, while guaranteeing poly-logarithmic regret in (corrupted) stochastic regimes. Our approach builds on the framework from \cite{BOBWhardproblems} to design BOBW algorithms for ``hard problem'', using analysis techniques tailored for the setting that we consider.

View on arXiv PDF

Similar