Debiased Machine Learning without Sample-Splitting for Stable Estimators
This work addresses a methodological bottleneck for researchers and practitioners in causal inference by reducing data requirements, though it is incremental as it builds on existing debiased ML frameworks.
The paper tackles the problem of requiring sample splitting in debiased machine learning for causal parameter estimation by showing that leave-one-out stability in auxiliary estimators eliminates this need, enabling sample re-use and benefiting moderately sized samples, with examples including ensemble bagged estimators using sub-sampling without replacement.
Estimation and inference on causal parameters is typically reduced to a generalized method of moments problem, which involves auxiliary functions that correspond to solutions to a regression or classification problem. Recent line of work on debiased machine learning shows how one can use generic machine learning estimators for these auxiliary problems, while maintaining asymptotic normality and root-$n$ consistency of the target parameter of interest, while only requiring mean-squared-error guarantees from the auxiliary estimation algorithms. The literature typically requires that these auxiliary problems are fitted on a separate sample or in a cross-fitting manner. We show that when these auxiliary estimation algorithms satisfy natural leave-one-out stability properties, then sample splitting is not required. This allows for sample re-use, which can be beneficial in moderately sized sample regimes. For instance, we show that the stability properties that we propose are satisfied for ensemble bagged estimators, built via sub-sampling without replacement, a popular technique in machine learning practice.