ME CO MLJun 30, 2017

Collaborative-controlled LASSO for Constructing Propensity Score-based Estimators in High-Dimensional Data

Cheng Ju, Richard Wyss, Jessica M. Franklin, Sebastian Schneeweiss, Jenny Häggström, Mark J. van der Laan

arXiv:1706.10029v14.33 citations

Originality Highly original

AI Analysis

This work addresses a critical gap in causal inference for observational studies with high-dimensional data, offering a novel method that enhances accuracy and reliability for researchers and practitioners in fields like healthcare.

The paper tackled the problem of model selection for propensity score estimation in high-dimensional causal inference by introducing a collaborative-controlled LASSO approach that considers both treatment and outcome associations to minimize bias-variance trade-offs. Results from quasi-experiments on electronic healthcare data showed that this method outperformed other estimators in point estimation and confidence interval coverage, with substantive improvements when applied to other propensity score-based estimators.

Propensity score (PS) based estimators are increasingly used for causal inference in observational studies. However, model selection for PS estimation in high-dimensional data has received little attention. In these settings, PS models have traditionally been selected based on the goodness-of-fit for the treatment mechanism itself, without consideration of the causal parameter of interest. Collaborative minimum loss-based estimation (C-TMLE) is a novel methodology for causal inference that takes into account information on the causal parameter of interest when selecting a PS model. This "collaborative learning" considers variable associations with both treatment and outcome when selecting a PS model in order to minimize a bias-variance trade off in the estimated treatment effect. In this study, we introduce a novel approach for collaborative model selection when using the LASSO estimator for PS estimation in high-dimensional covariate settings. To demonstrate the importance of selecting the PS model collaboratively, we designed quasi-experiments based on a real electronic healthcare database, where only the potential outcomes were manually generated, and the treatment and baseline covariates remained unchanged. Results showed that the C-TMLE algorithm outperformed other competing estimators for both point estimation and confidence interval coverage. In addition, the PS model selected by C-TMLE could be applied to other PS-based estimators, which also resulted in substantive improvement for both point estimation and confidence interval coverage. We illustrate the discussed concepts through an empirical example comparing the effects of non-selective nonsteroidal anti-inflammatory drugs with selective COX-2 inhibitors on gastrointestinal complications in a population of Medicare beneficiaries.

View on arXiv PDF

Similar