ML LG AP MEDec 30, 2024

Post Launch Evaluation of Policies in a High-Dimensional Setting

Shima Nassiri, Mohsen Bayati, Joe Cooprider

arXiv:2501.00119v1h-index: 6

Originality Incremental advance

AI Analysis

This work addresses the challenge of evaluating policies efficiently for industries like e-commerce and ride-sharing, though it is incremental as it builds on existing synthetic control methods.

The paper tackles the problem of costly A/B testing in high-dimensional settings with millions of units by proposing a synthetic control method with nearest neighbor matching and supervised learning to estimate counterfactual outcomes, demonstrating improved accuracy in six large-scale experiments while identifying and addressing machine learning bias.

A/B tests, also known as randomized controlled experiments (RCTs), are the gold standard for evaluating the impact of new policies, products, or decisions. However, these tests can be costly in terms of time and resources, potentially exposing users, customers, or other test subjects (units) to inferior options. This paper explores practical considerations in applying methodologies inspired by "synthetic control" as an alternative to traditional A/B testing in settings with very large numbers of units, involving up to hundreds of millions of units, which is common in modern applications such as e-commerce and ride-sharing platforms. This method is particularly valuable in settings where the treatment affects only a subset of units, leaving many units unaffected. In these scenarios, synthetic control methods leverage data from unaffected units to estimate counterfactual outcomes for treated units. After the treatment is implemented, these estimates can be compared to actual outcomes to measure the treatment effect. A key challenge in creating accurate counterfactual outcomes is interpolation bias, a well-documented phenomenon that occurs when control units differ significantly from treated units. To address this, we propose a two-phase approach: first using nearest neighbor matching based on unit covariates to select similar control units, then applying supervised learning methods suitable for high-dimensional data to estimate counterfactual outcomes. Testing using six large-scale experiments demonstrates that this approach successfully improves estimate accuracy. However, our analysis reveals that machine learning bias -- which arises from methods that trade off bias for variance reduction -- can impact results and affect conclusions about treatment effects. We document this bias in large-scale experimental settings and propose effective de-biasing techniques to address this challenge.

View on arXiv PDF

Similar