Efficient Inference after Directionally Stable Adaptive Experiments
This work addresses the challenge of reliable inference in adaptive experiments, which is crucial for fields like online learning and clinical trials, though it is incremental as it builds on existing stability concepts.
The paper tackles the problem of performing efficient statistical inference on scalar targets after adaptive data collection, such as in bandit algorithms, by introducing a weaker condition called directional stability. It shows that under this condition, estimators remain asymptotically normal and semiparametrically efficient, with verification for LinUCB providing the first such guarantee for a regular scalar target under this sampling method.
We study inference on scalar-valued pathwise differentiable targets after adaptive data collection, such as a bandit algorithm. We introduce a novel target-specific condition, directional stability, which is strictly weaker than previously imposed target-agnostic stability conditions. Under directional stability, we show that estimators that would have been efficient under i.i.d. data remain asymptotically normal and semiparametrically efficient when computed from adaptively collected trajectories. The canonical gradient has a martingale form, and directional stability guarantees stabilization of its predictable quadratic variation, enabling high-dimensional asymptotic normality. We characterize efficiency using a convolution theorem for the adaptive-data setting, and give a condition under which the one-step estimator attains the efficiency bound. We verify directional stability for LinUCB, yielding the first semiparametric efficiency guarantee for a regular scalar target under LinUCB sampling.