Penalized Empirical Likelihood for Doubly Robust Causal Inference under Contamination in High Dimensions
This addresses inferential challenges in causal inference for high-dimensional data, offering a robust solution for researchers in fields like genomics, though it appears incremental as it builds on existing doubly robust and penalized methods.
The paper tackles the problem of estimating average treatment effects in high-dimensional, low-sample-size observational studies with contamination and model misspecification, proposing a doubly robust estimator that achieves superior performance in bias, error metrics, and interval calibration in simulations and gene expression datasets.
We propose a doubly robust estimator for the average treatment effect in high dimensional low sample size observational studies, where contamination and model misspecification pose serious inferential challenges. The estimator combines bounded influence estimating equations for outcome modeling with covariate balancing propensity scores for treatment assignment, embedded within a penalized empirical likelihood framework using nonconvex regularization. It satisfies the oracle property by jointly achieving consistency under partial model correct ness, selection consistency, robustness to contamination, and asymptotic normality. For uncertainty quantification, we derive a finite sample confidence interval using cumulant generating functions and influence function corrections, avoiding reliance on asymptotic approximations. Simulation studies and applications to gene expression datasets (Golub and Khan) demonstrate superior performance in bias, error metrics, and interval calibration, highlighting the method robustness and inferential validity in HDLSS regimes. One notable aspect is that even in the absence of contamination, the proposed estimator and its confidence interval remain efficient compared to those of competing models.