Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond
This work addresses the challenge of efficient inference in causal inference for researchers and practitioners dealing with many covariates or flexible relationships, offering a novel solution that is incremental in improving debiased machine learning methods.
The paper tackles the problem of estimating low-dimensional parameters with high-dimensional nuisances, such as quantile treatment effects in causal inference, by proposing localized debiased machine learning (LDML) to avoid the burdensome step of learning entire conditional distributions. The result is a practically-feasible estimator that achieves the same favorable asymptotic behavior as infeasible estimators, as demonstrated in empirical studies.
We consider estimating a low-dimensional parameter in an estimating equation involving high-dimensional nuisances that depend on the parameter. A central example is the efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference, which involves as a nuisance the covariate-conditional cumulative distribution function evaluated at the quantile to be estimated. Debiased machine learning (DML) is a data-splitting approach to estimating high-dimensional nuisances using flexible machine learning methods, but applying it to problems with parameter-dependent nuisances is impractical. For (L)QTE, DML requires we learn the whole covariate-conditional cumulative distribution function. We instead propose localized debiased machine learning (LDML), which avoids this burdensome step and needs only estimate nuisances at a single initial rough guess for the parameter. For (L)QTE, LDML involves learning just two regression functions, a standard task for machine learning methods. We prove that under lax rate conditions our estimator has the same favorable asymptotic behavior as the infeasible estimator that uses the unknown true nuisances. Thus, LDML notably enables practically-feasible and theoretically-grounded efficient estimation of important quantities in causal inference such as (L)QTEs when we must control for many covariates and/or flexible relationships, as we demonstrate in empirical studies.