Semiparametrically Efficient Inference for Kernel Measures of Noise Heterogeneity
Provides valid inference for residual dependence and goodness-of-fit tests in additive noise models when regression functions are estimated with flexible machine learning methods, addressing first-stage bias.
The paper develops semiparametrically efficient inference for kernel measures of noise heterogeneity in additive noise models, constructing a one-step estimator that yields bootstrap-calibrated tests and asymptotically efficient confidence intervals. Simulations show improved calibration and power over naive plug-in residual methods.
We develop semiparametrically efficient inference for kernel measures of noise heterogeneity in additive noise models. In many applications, the regression function is estimated using flexible machine learning methods. Downstream procedures based on the resulting residuals can then inherit first-stage bias: regression error may induce spurious dependence between covariates and residuals, invalidating the assumptions needed for standard analysis. We construct a novel Hilbert-valued one-step estimator of the kernel covariance operator between covariates and residuals. Our estimator yields bootstrap-calibrated tests for residual independence and goodness of fit in additive noise models, while also providing asymptotically efficient confidence intervals for the kernel dependence measure under noise heterogeneity. The framework extends to settings with additional covariates, enabling inference on distributional heterogeneity of residual noise across treatment groups. Simulations show improved calibration and power relative to naive plug-in residual methods.