LGMLJun 23, 2025

Experimenting, Fast and Slow: Bayesian Optimization of Long-term Outcomes with Online Experiments

arXiv:2506.18744v23 citationsh-index: 32KDD
Originality Incremental advance
AI Analysis

This addresses the challenge for decision-makers in internet systems, such as recommender systems, who need to tune policies efficiently without waiting for long-running experiments, though it is incremental in improving existing sequential experimentation methods.

The paper tackles the problem of optimizing long-term outcomes in online experiments, which typically require lengthy sequential tests, by proposing a novel approach that combines fast experiments and offline proxies with slow experiments to perform Bayesian optimization quickly.

Online experiments in internet systems, also known as A/B tests, are used for a wide range of system tuning problems, such as optimizing recommender system ranking policies and learning adaptive streaming controllers. Decision-makers generally wish to optimize for long-term treatment effects of the system changes, which often requires running experiments for a long time as short-term measurements can be misleading due to non-stationarity in treatment effects over time. The sequential experimentation strategies--which typically involve several iterations--can be prohibitively long in such cases. We describe a novel approach that combines fast experiments (e.g., biased experiments run only for a few hours or days) and/or offline proxies (e.g., off-policy evaluation) with long-running, slow experiments to perform sequential, Bayesian optimization over large action spaces in a short amount of time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes