LGAug 8, 2024

AExGym: Benchmarks and Environments for Adaptive Experimentation

Jimmy Wang, Ethan Che, Daniel R. Jiang, Hongseok Namkoong

arXiv:2408.04531v12.6h-index: 20Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the inefficiency of static randomized trials for practitioners in science and industry by providing tools to improve adaptive experimentation, though it is incremental as it builds on existing adaptive design concepts.

The authors tackled the problem of limited adoption of adaptive experimentation designs by creating AExGym, a benchmark and open-source library based on real-world datasets to address practical challenges like non-stationarity and delayed feedback, aiming to spur methodological development focused on robustness.

Innovations across science and industry are evaluated using randomized trials (a.k.a. A/B tests). While simple and robust, such static designs are inefficient or infeasible for testing many hypotheses. Adaptive designs can greatly improve statistical power in theory, but they have seen limited adoption due to their fragility in practice. We present a benchmark for adaptive experimentation based on real-world datasets, highlighting prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity. Our benchmark aims to spur methodological development that puts practical performance (e.g., robustness) as a central concern, rather than mathematical guarantees on contrived instances. We release an open source library, AExGym, which is designed with modularity and extensibility in mind to allow experimentation practitioners to develop custom environments and algorithms.

View on arXiv PDF

Similar