LGNov 7, 2024

Enhancing Missing Data Imputation through Combined Bipartite Graph and Complete Directed Graph

arXiv:2411.04907v1h-index: 5
Originality Highly original
AI Analysis

This addresses the problem of improving imputation accuracy for tabular data, which is incremental as it builds on existing graph-based methods.

The paper tackles missing data imputation in tabular data by introducing the BCGNN framework, which uses bipartite and complete directed graphs to model feature interdependencies, resulting in a 15% average reduction in mean absolute error compared to state-of-the-art methods.

In this paper, we aim to address a significant challenge in the field of missing data imputation: identifying and leveraging the interdependencies among features to enhance missing data imputation for tabular data. We introduce a novel framework named the Bipartite and Complete Directed Graph Neural Network (BCGNN). Within BCGNN, observations and features are differentiated as two distinct node types, and the values of observed features are converted into attributed edges linking them. The bipartite segment of our framework inductively learns embedding representations for nodes, efficiently utilizing the comprehensive information encapsulated in the attributed edges. In parallel, the complete directed graph segment adeptly outlines and communicates the complex interdependencies among features. When compared to contemporary leading imputation methodologies, BCGNN consistently outperforms them, achieving a noteworthy average reduction of 15% in mean absolute error for feature imputation tasks under different missing mechanisms. Our extensive experimental investigation confirms that an in-depth grasp of the interdependence structure substantially enhances the model's feature embedding ability. We also highlight the model's superior performance in label prediction tasks involving missing data, and its formidable ability to generalize to unseen data points.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes