ML IT LG STDec 23, 2025

Avoiding the Price of Adaptivity: Inference in Linear Contextual Bandits via Stability

arXiv:2512.20368v112.33 citationsh-index: 11

Originality Incremental advance

AI Analysis

This work addresses the challenge of reliable inference for researchers and practitioners in bandit algorithms, offering a method that avoids the 'price of adaptivity' while maintaining performance, though it is incremental as it builds on existing stability concepts.

The paper tackles the problem of statistical inference in linear contextual bandits, where adaptive data collection complicates confidence interval construction, by proposing a penalized EXP4 algorithm that satisfies the Lai-Wei stability condition, enabling valid Wald-type confidence intervals without the typical inflation factor and achieving near-minimax optimal regret.

Statistical inference in contextual bandits is complicated by the adaptive, non-i.i.d. nature of the data. A growing body of work has shown that classical least-squares inference may fail under adaptive sampling, and that constructing valid confidence intervals for linear functionals of the model parameter typically requires paying an unavoidable inflation of order $\sqrt{d \log T}$. This phenomenon -- often referred to as the price of adaptivity -- highlights the inherent difficulty of reliable inference under general contextual bandit policies. A key structural property that circumvents this limitation is the \emph{stability} condition of Lai and Wei, which requires the empirical feature covariance to concentrate around a deterministic limit. When stability holds, the ordinary least-squares estimator satisfies a central limit theorem, and classical Wald-type confidence intervals -- designed for i.i.d. data -- become asymptotically valid even under adaptation, \emph{without} incurring the $\sqrt{d \log T}$ price of adaptivity. In this paper, we propose and analyze a penalized EXP4 algorithm for linear contextual bandits. Our first main result shows that this procedure satisfies the Lai--Wei stability condition and therefore admits valid Wald-type confidence intervals for linear functionals. Our second result establishes that the same algorithm achieves regret guarantees that are minimax optimal up to logarithmic factors, demonstrating that stability and statistical efficiency can coexist within a single contextual bandit method. Finally, we complement our theory with simulations illustrating the empirical normality of the resulting estimators and the sharpness of the corresponding confidence intervals.

View on arXiv PDF

Similar