Estimating Heterogeneous Treatment Effects by Combining Weak Instruments and Observational Data
This addresses a key challenge in personalized medicine and digital analytics for researchers and practitioners dealing with biased observational data and weak instruments.
The paper tackles the problem of estimating conditional average treatment effects (CATEs) in the presence of unobserved confounding in observational data and low compliance in instrumental variable (IV) data, by developing a two-stage framework that combines both data sources with a compliance-weighted correction, validated through simulations and a real-world analysis of 401(k) plan effects on wealth.
Accurately predicting conditional average treatment effects (CATEs) is crucial in personalized medicine and digital platform analytics. Since the treatments of interest often cannot be directly randomized, observational data is leveraged to learn CATEs, but this approach can incur significant bias from unobserved confounding. One strategy to overcome these limitations is to leverage instrumental variables (IVs) as latent quasi-experiments, such as randomized intent-to-treat assignments or randomized product recommendations. This approach, on the other hand, can suffer from low compliance, $\textit{i.e.}$, IV weakness. Some subgroups may even exhibit zero compliance, meaning we cannot instrument for their CATEs at all. In this paper, we develop a novel approach to combine IV and observational data to enable reliable CATE estimation in the presence of unobserved confounding in the observational data and low compliance in the IV data, including no compliance for some subgroups. We propose a two-stage framework that first learns $\textit{biased}$ CATEs from the observational data, and then applies a compliance-weighted correction using IV data, effectively leveraging IV strength variability across covariates. We characterize the convergence rates of our method and validate its effectiveness through a simulation study. Additionally, we demonstrate its utility with real data by analyzing the heterogeneous effects of 401(k) plan participation on wealth.