LG MEDec 20, 2022

Out-of-sample scoring and automatic selection of causal estimators

Egor Kraev, Timo Flesch, Hudson Taylor Lekunze, Mark Harley, Pere Planell Morell

arXiv:2212.10076v11.81 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This addresses a practical bottleneck for practitioners in causal inference by providing a way to score and optimize models, though it is incremental as it builds on existing libraries and methods.

The paper tackles the problem of selecting and tuning causal estimators for Conditional Average Treatment Effect (CATE) and instrumental variable (IV) problems by proposing novel out-of-sample scoring methods, enabling hyperparameter optimization and model selection, and demonstrates on synthetic data that this approach yields estimates close to true impact.

Recently, many causal estimators for Conditional Average Treatment Effect (CATE) and instrumental variable (IV) problems have been published and open sourced, allowing to estimate granular impact of both randomized treatments (such as A/B tests) and of user choices on the outcomes of interest. However, the practical application of such models has ben hampered by the lack of a valid way to score the performance of such models out of sample, in order to select the best one for a given application. We address that gap by proposing novel scoring approaches for both the CATE case and an important subset of instrumental variable problems, namely those where the instrumental variable is customer acces to a product feature, and the treatment is the customer's choice to use that feature. Being able to score model performance out of sample allows us to apply hyperparameter optimization methods to causal model selection and tuning. We implement that in an open source package that relies on DoWhy and EconML libraries for implementation of causal inference models (and also includes a Transformed Outcome model implementation), and on FLAML for hyperparameter optimization and for component models used in the causal models. We demonstrate on synthetic data that optimizing the proposed scores is a reliable method for choosing the model and its hyperparameter values, whose estimates are close to the true impact, in the randomized CATE and IV cases. Further, we provide examles of applying these methods to real customer data from Wise.

View on arXiv PDF

Similar