A framework for causal segmentation analysis with machine learning in large-scale digital experiments
This work addresses the need for more precise causal analysis in digital experiments, enabling better targeting of treatments based on subgroup benefits, though it is incremental by building on existing causal inference methods.
The authors tackled the problem of identifying user subgroups with differential treatment effects in large-scale digital experiments, resulting in a model-agnostic framework that unifies segment discovery and causal impact evaluation, with an open-source R package implementation.
We present an end-to-end methodological framework for causal segment discovery that aims to uncover differential impacts of treatments across subgroups of users in large-scale digital experiments. Building on recent developments in causal inference and non/semi-parametric statistics, our approach unifies two objectives: (1) the discovery of user segments that stand to benefit from a candidate treatment based on subgroup-specific treatment effects, and (2) the evaluation of causal impacts of dynamically assigning units to a study's treatment arm based on their predicted segment-specific benefit or harm. Our proposal is model-agnostic, capable of incorporating state-of-the-art machine learning algorithms into the estimation procedure, and is applicable in randomized A/B tests and quasi-experiments. An open source R package implementation, sherlock, is introduced.