LGFeb 28

Lookahead identification in adversarial bandits: accuracy and memory bounds

Nataly Brukhim, Nicolò Cesa-Bianchi, Carlo Ciliberto

arXiv:2603.00803v11.4h-index: 19

Originality Highly original

AI Analysis

This addresses the challenge of making reliable predictions in adversarial environments for bandit learners, offering a novel theoretical framework with implications for sequential decision-making under uncertainty.

The paper tackles the problem of identifying a near-optimal arm for future time windows in adversarial multi-armed bandits, where past performance is unreliable, and shows that an algorithm achieves an accuracy of ε = O(1/√log T) over Ω(√T) windows, with a lower bound of Ω(1/log T).

We study an identification problem in multi-armed bandits. In each round a learner selects one of $K$ arms and observes its reward, with the goal of eventually identifying an arm that will perform best at a {\it future} time. In adversarial environments, however, past performance may offer little information about the future, raising the question of whether meaningful identification is possible at all. In this work, we introduce \emph{lookahead identification}, a task in which the goal of the learner is to select a future prediction window and commit in advance to an arm whose average reward over that window is within $\varepsilon$ of optimal. Our analysis characterizes both the achievable accuracy of lookahead identification and the memory resources required to obtain it. From an accuracy standpoint, for any horizon $T$ we give an algorithm achieving $\varepsilon = O\bigl(1/\sqrt{\log T}\bigr)$ over $Ω(\sqrt{T})$ prediction windows. This demonstrates that, perhaps surprisingly, identification is possible in adversarial settings, despite significant lack of information. We also prove a near-matching lower bound showing that $\varepsilon = Ω\bigl(1/\log T\bigr)$ is unavoidable. We then turn to investigate the role of memory in our problem, first proving that any algorithm achieving nontrivial accuracy requires $Ω(K)$ bits of memory. Under a natural \emph{local sparsity} condition, we show that the same accuracy guarantees can be achieved using only poly-logarithmic memory.

View on arXiv PDF

Similar