Ensemble Method for Estimating Individualized Treatment Effects
This provides a robust solution for medical and business applications like clinical trials and A/B testing, though it is incremental as it builds on existing model aggregation approaches.
The paper tackles the challenge of estimating individualized treatment effects when ground-truth effects are unobservable by proposing an ensemble algorithm that aggregates estimates from diverse models, showing it outperforms model selection on 43 benchmark datasets and proving it is asymptotically at least as accurate as the best candidate model.
In many medical and business applications, researchers are interested in estimating individualized treatment effects using data from a randomized experiment. For example in medical applications, doctors learn the treatment effects from clinical trials and in technology companies, researchers learn them from A/B testing experiments. Although dozens of machine learning models have been proposed for this task, it is challenging to determine which model will be best for the problem at hand because ground-truth treatment effects are unobservable. In contrast to several recent papers proposing methods to select one of these competing models, we propose an algorithm for aggregating the estimates from a diverse library of models. We compare ensembling to model selection on 43 benchmark datasets, and find that ensembling wins almost every time. Theoretically, we prove that our ensemble model is (asymptotically) at least as accurate as the best model under consideration, even if the number of candidate models is allowed to grow with the sample size.