LG MLJun 3, 2025

Multi-Metric Adaptive Experimental Design under Fixed Budget with Validation

Qining Zhang, Tanner Fiez, Yi Liu, Wenyang Liu

arXiv:2506.03062v17.11 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work addresses statistical power issues in online A/B testing for multiple metrics, offering an incremental improvement over existing adaptive designs like sequential halving.

The paper tackles the challenge of conducting adaptive experimental designs with multiple metrics and heterogeneous variances under a fixed budget by proposing a two-phase framework that combines adaptive exploration with a validation A/B test. It introduces SHRVar, a method that achieves a provable error probability decreasing exponentially, demonstrating superior performance in numerical experiments.

Standard A/B tests in online experiments face statistical power challenges when testing multiple candidates simultaneously, while adaptive experimental designs (AED) alone fall short in inferring experiment statistics such as the average treatment effect, especially with many metrics (e.g., revenue, safety) and heterogeneous variances. This paper proposes a fixed-budget multi-metric AED framework with a two-phase structure: an adaptive exploration phase to identify the best treatment, and a validation phase with an A/B test to verify the treatment's quality and infer statistics. We propose SHRVar, which generalizes sequential halving (SH) (Karnin et al., 2013) with a novel relative-variance-based sampling and an elimination strategy built on reward z-values. It achieves a provable error probability that decreases exponentially, where the exponent generalizes the complexity measure for SH (Karnin et al., 2013) and SHVar (Lalitha et al., 2023) with homogeneous and heterogeneous variances, respectively. Numerical experiments verify our analysis and demonstrate the superior performance of this new framework.

View on arXiv PDF

Similar