Collaborative-controlled LASSO for Constructing Propensity Score-based Estimators in High-Dimensional Data
This work addresses a critical gap in causal inference for observational studies with high-dimensional data, offering a novel method that enhances accuracy and reliability for researchers and practitioners in fields like healthcare.
The paper tackled the problem of model selection for propensity score estimation in high-dimensional causal inference by introducing a collaborative-controlled LASSO approach that considers both treatment and outcome associations to minimize bias-variance trade-offs. Results from quasi-experiments on electronic healthcare data showed that this method outperformed other estimators in point estimation and confidence interval coverage, with substantive improvements when applied to other propensity score-based estimators.
Propensity score (PS) based estimators are increasingly used for causal inference in observational studies. However, model selection for PS estimation in high-dimensional data has received little attention. In these settings, PS models have traditionally been selected based on the goodness-of-fit for the treatment mechanism itself, without consideration of the causal parameter of interest. Collaborative minimum loss-based estimation (C-TMLE) is a novel methodology for causal inference that takes into account information on the causal parameter of interest when selecting a PS model. This "collaborative learning" considers variable associations with both treatment and outcome when selecting a PS model in order to minimize a bias-variance trade off in the estimated treatment effect. In this study, we introduce a novel approach for collaborative model selection when using the LASSO estimator for PS estimation in high-dimensional covariate settings. To demonstrate the importance of selecting the PS model collaboratively, we designed quasi-experiments based on a real electronic healthcare database, where only the potential outcomes were manually generated, and the treatment and baseline covariates remained unchanged. Results showed that the C-TMLE algorithm outperformed other competing estimators for both point estimation and confidence interval coverage. In addition, the PS model selected by C-TMLE could be applied to other PS-based estimators, which also resulted in substantive improvement for both point estimation and confidence interval coverage. We illustrate the discussed concepts through an empirical example comparing the effects of non-selective nonsteroidal anti-inflammatory drugs with selective COX-2 inhibitors on gastrointestinal complications in a population of Medicare beneficiaries.