MEAILGApr 7, 2024

Review for Handling Missing Data with special missing mechanism

arXiv:2404.04905v129 citationsh-index: 18
Originality Synthesis-oriented
AI Analysis

It addresses the problem of missing data handling for data analysts and researchers, but it is incremental as it reviews and synthesizes existing literature without introducing new methods.

This review tackles the challenge of handling missing data, particularly focusing on the less-explored Missing At Random (MAR) and Missing Not At Random (MNAR) mechanisms, by comparing existing methods and identifying research gaps to guide data analysts and researchers in real-world applications.

Missing data poses a significant challenge in data science, affecting decision-making processes and outcomes. Understanding what missing data is, how it occurs, and why it is crucial to handle it appropriately is paramount when working with real-world data, especially in tabular data, one of the most commonly used data types in the real world. Three missing mechanisms are defined in the literature: Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR), each presenting unique challenges in imputation. Most existing work are focused on MCAR that is relatively easy to handle. The special missing mechanisms of MNAR and MAR are less explored and understood. This article reviews existing literature on handling missing values. It compares and contrasts existing methods in terms of their ability to handle different missing mechanisms and data types. It identifies research gap in the existing literature and lays out potential directions for future research in the field. The information in this review will help data analysts and researchers to adopt and promote good practices for handling missing data in real-world problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes