Optimal Conditional Inference in Adaptive Experiments
This work addresses statistical inference challenges for researchers and practitioners in adaptive experimental designs, such as bandit algorithms, but it is incremental, building on existing conditional inference frameworks.
The paper tackles the problem of performing optimal conditional inference in adaptive experiments, where stopping times, assignment probabilities, and target parameters are chosen adaptively based on data. It shows that using only the last batch's results is optimal in general, and derives additional information and tractable procedures under specific invariance or polyhedral constraints.
We study batched bandit experiments and consider the problem of inference conditional on the realized stopping time, assignment probabilities, and target parameter, where all of these may be chosen adaptively using information up to the last batch of the experiment. Absent further restrictions on the experiment, we show that inference using only the results of the last batch is optimal. When the adaptive aspects of the experiment are known to be location-invariant, in the sense that they are unchanged when we shift all batch-arm means by a constant, we show that there is additional information in the data, captured by one additional linear function of the batch-arm means. In the more restrictive case where the stopping time, assignment probabilities, and target parameter are known to depend on the data only through a collection of polyhedral events, we derive computationally tractable and optimal conditional inference procedures.