GEDI: A Graph-based End-to-end Data Imputation Framework
This work addresses data imputation for machine learning applications, offering a tailored approach that is incremental in combining existing techniques like Transformers and graph learning.
The paper tackles the problem of missing data in practical applications by proposing a graph-based end-to-end data imputation framework that preserves row-wise and column-wise relationships and tailors imputation to downstream prediction tasks, showing consistent improvements in imputation and label prediction performance over benchmark methods on real-world large datasets.
Data imputation is an effective way to handle missing data, which is common in practical applications. In this study, we propose and test a novel data imputation process that achieve two important goals: (1) preserve the row-wise similarities among observations and column-wise contextual relationships among features in the feature matrix, and (2) tailor the imputation process to specific downstream label prediction task. The proposed imputation process uses Transformer network and graph structure learning to iteratively refine the contextual relationships among features and similarities among observations. Moreover, it uses a meta-learning framework to select features that are influential to the downstream prediction task of interest. We conduct experiments on real-world large data sets, and show that the proposed imputation process consistently improves imputation and label prediction performance over a variety of benchmark methods.