Nearly Optimal Bayesian Inference for Structural Missingness
This addresses the challenge of handling missing data with complex dependencies for machine learning practitioners, offering a robust solution that avoids overconfident decisions, though it builds incrementally on Bayesian methods.
The paper tackles the problem of structural missingness in data, where missing values are undefined due to causal or logical constraints and depend on observed and unobserved variables, by proposing a Bayesian inference framework that decouples learning a missing-value posterior from label prediction to propagate uncertainty. It achieves state-of-the-art results on 43 classification and 15 imputation benchmarks with near Bayes-optimality guarantees.
Structural missingness breaks 'just impute and train': values can be undefined by causal or logical constraints, and the mask may depend on observed variables, unobserved variables (MNAR), and other missingness indicators. It simultaneously brings (i) a catch-22 situation with causal loop, prediction needs the missing features, yet inferring them depends on the missingness mechanism, (ii) under MNAR, the unseen are different, the missing part can come from a shifted distribution, and (iii) plug-in imputation, a single fill-in can lock in uncertainty and yield overconfident, biased decisions. In the Bayesian view, prediction via the posterior predictive distribution integrates over the full model posterior uncertainty, rather than relying on a single point estimate. This framework decouples (i) learning an in-model missing-value posterior from (ii) label prediction by optimizing the predictive posterior distribution, enabling posterior integration. This decoupling yields an in-model almost-free-lunch: once the posterior is learned, prediction is plug-and-play while preserving uncertainty propagation. It achieves SOTA on 43 classification and 15 imputation benchmarks, with finite-sample near Bayes-optimality guarantees under our SCM prior.