LGAIOct 11, 2024

M$^3$-Impute: Mask-guided Representation Learning for Missing Value Imputation

arXiv:2410.08794v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the challenge of improving data quality for machine learning applications, though it appears incremental by building on existing graph-based imputation methods.

The paper tackles the problem of missing value imputation in datasets by proposing M^3-Impute, a method that explicitly incorporates missingness information and models feature and sample correlations, achieving 20 best and 4 second-best MAE scores on average across 25 benchmark datasets under three missingness settings.

Missing values are a common problem that poses significant challenges to data analysis and machine learning. This problem necessitates the development of an effective imputation method to fill in the missing values accurately, thereby enhancing the overall quality and utility of the datasets. Existing imputation methods, however, fall short of explicitly considering the `missingness' information in the data during the embedding initialization stage and modeling the entangled feature and sample correlations during the learning process, thus leading to inferior performance. We propose M$^3$-Impute, which aims to explicitly leverage the missingness information and such correlations with novel masking schemes. M$^3$-Impute first models the data as a bipartite graph and uses a graph neural network to learn node embeddings, where the refined embedding initialization process directly incorporates the missingness information. They are then optimized through M$^3$-Impute's novel feature correlation unit (FRU) and sample correlation unit (SRU) that effectively captures feature and sample correlations for imputation. Experiment results on 25 benchmark datasets under three different missingness settings show the effectiveness of M$^3$-Impute by achieving 20 best and 4 second-best MAE scores on average.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes