MELGSTMLApr 21, 2025

Deep learning with missing data

arXiv:2504.15388v23 citationsh-index: 39
Originality Highly original
AI Analysis

This addresses missing data issues in machine learning applications, offering a robust method that is incremental but with strong theoretical guarantees.

The authors tackled the problem of multivariate nonparametric regression with missing covariates by proposing Pattern Embedded Neural Networks (PENNs), which combine imputed data and observation indicators to achieve minimax rate of convergence up to a poly-logarithmic factor, with numerical experiments showing dramatic improvements over standard neural networks.

In the context of multivariate nonparametric regression with missing covariates, we propose Pattern Embedded Neural Networks (PENNs), which can be applied in conjunction with any existing imputation technique. In addition to a neural network trained on the imputed data, PENNs pass the vectors of observation indicators through a second neural network to provide a compact representation. The outputs are then combined in a third neural network to produce final predictions. Our main theoretical result exploits an assumption that the observation patterns can be partitioned into cells on which the Bayes regression function behaves similarly, and belongs to a compositional Hölder class. It provides a finite-sample excess risk bound that holds for an arbitrary missingness mechanism, and in combination with a complementary minimax lower bound, demonstrates that our PENN estimator attains in typical cases the minimax rate of convergence as if the cells of the partition were known in advance, up to a poly-logarithmic factor in the sample size. Numerical experiments on simulated, semi-synthetic and real data confirm that the PENN estimator consistently improves, often dramatically, on standard neural networks without pattern embedding. Code to reproduce our experiments, as well as a tutorial on how to apply our method, is publicly available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes