LG AP MLOct 29, 2020

Targeting for long-term outcomes

Jeremy Yang, Dean Eckles, Paramveer Dhillon, Sinan Aral

arXiv:2010.15835v215.057 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge for decision-makers in fields like marketing or policy who need to optimize interventions based on long-term outcomes, offering a practical solution with demonstrated financial gains, though it builds incrementally on existing surrogacy and policy learning methods.

The authors tackled the problem of targeting interventions to maximize long-term outcomes, which are often unobserved, by developing a method that imputes missing long-term outcomes and learns optimal policies using a doubly-robust approach. In large-scale experiments at The Boston Globe, this approach achieved statistically indistinguishable performance compared to using ground-truth outcomes and resulted in a net-positive revenue impact of $4-5 million over three years.

Decision makers often want to target interventions so as to maximize an outcome that is observed only in the long-term. This typically requires delaying decisions until the outcome is observed or relying on simple short-term proxies for the long-term outcome. Here we build on the statistical surrogacy and policy learning literatures to impute the missing long-term outcomes and then approximate the optimal targeting policy on the imputed outcomes via a doubly-robust approach. We first show that conditions for the validity of average treatment effect estimation with imputed outcomes are also sufficient for valid policy evaluation and optimization; furthermore, these conditions can be somewhat relaxed for policy optimization. We apply our approach in two large-scale proactive churn management experiments at The Boston Globe by targeting optimal discounts to its digital subscribers with the aim of maximizing long-term revenue. Using the first experiment, we evaluate this approach empirically by comparing the policy learned using imputed outcomes with a policy learned on the ground-truth, long-term outcomes. The performance of these two policies is statistically indistinguishable, and we rule out large losses from relying on surrogates. Our approach also outperforms a policy learned on short-term proxies for the long-term outcome. In a second field experiment, we implement the optimal targeting policy with additional randomized exploration, which allows us to update the optimal policy for future subscribers. Over three years, our approach had a net-positive revenue impact in the range of $4-5 million compared to the status quo.

View on arXiv PDF

Similar