Meta-Analysis of Randomized Experiments with Applications to Heavy-Tailed Response Data
This work addresses a central obstacle in evaluating treatment effect estimators for researchers and practitioners in fields like supply chain management, offering a practical solution for heavy-tailed data, though it is incremental as it builds on existing cross-validation and aggregation ideas.
The paper tackles the problem of assessing treatment effect estimators in randomized control trials without ground truth by proposing a cross-validation-like method that uses unbiased difference-of-means estimates as pseudo-labels and aggregates across multiple RCTs. It evaluates the method on 699 Amazon supply chain RCTs, finding that downweighting or truncating large values in heavy-tailed data improves estimation accuracy by reducing variance despite introducing bias.
A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance. In this paper, we propose a novel cross-validation-like methodology to address this challenge. The key insight of our procedure is that the noisy (but unbiased) difference-of-means estimate can be used as a ground truth ``label" on a portion of the RCT, to test the performance of an estimator trained on the other portion. We combine this insight with an aggregation scheme, which borrows statistical strength across a large collection of RCTs, to present an end-to-end methodology for judging an estimator's ability to recover the underlying treatment effect as well as produce an optimal treatment "roll out" policy. We evaluate our methodology across 699 RCTs implemented in the Amazon supply chain. In this heavy-tailed setting, our methodology suggests that procedures that aggressively downweight or truncate large values, while introducing bias, lower the variance enough to ensure that the treatment effect is more accurately estimated.