MLLGJun 3, 2013

Prediction with Missing Data via Bayesian Additive Regression Trees

arXiv:1306.0618v383 citations
AI Analysis

This work addresses the challenge of missing data for researchers and practitioners using tree-based models, offering an incremental improvement by integrating an existing missingness technique into BART.

The paper tackles the problem of handling missing data in non-parametric statistical learning by enhancing Bayesian Additive Regression Trees (BART) with a method to incorporate missingness directly, eliminating the need for imputation. The result shows that this approach achieves higher predictive performance and greater stability compared to competitors in simulations with various missing data models.

We present a method for incorporating missing data in non-parametric statistical learning without the need for imputation. We focus on a tree-based method, Bayesian Additive Regression Trees (BART), enhanced with "Missingness Incorporated in Attributes," an approach recently proposed incorporating missingness into decision trees (Twala, 2008). This procedure takes advantage of the partitioning mechanisms found in tree-based models. Simulations on generated models and real data indicate that our proposed method can forecast well on complicated missing-at-random and not-missing-at-random models as well as models where missingness itself influences the response. Our procedure has higher predictive performance and is more stable than competitors in many cases. We also illustrate BART's abilities to incorporate missingness into uncertainty intervals and to detect the influence of missingness on the model fit.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes