LGDBJan 10, 2022

Differentiable and Scalable Generative Adversarial Models for Data Imputation

arXiv:2201.03202v128 citations
Originality Highly original
AI Analysis

This addresses the scalability problem for data imputation in real-life applications with large datasets, offering a significant speed-up with accuracy guarantees.

The paper tackles the computational inefficiency of training generative adversarial models for data imputation on large-scale incomplete datasets, proposing a system (SCIS) that accelerates training by 7.1x while maintaining competitive accuracy using only 7.6% of samples.

Data imputation has been extensively explored to solve the missing data problem. The dramatically increasing volume of incomplete data makes the imputation models computationally infeasible in many real-life applications. In this paper, we propose an effective scalable imputation system named SCIS to significantly speed up the training of the differentiable generative adversarial imputation models under accuracy-guarantees for large-scale incomplete data. SCIS consists of two modules, differentiable imputation modeling (DIM) and sample size estimation (SSE). DIM leverages a new masking Sinkhorn divergence function to make an arbitrary generative adversarial imputation model differentiable, while for such a differentiable imputation model, SSE can estimate an appropriate sample size to ensure the user-specified imputation accuracy of the final model. Extensive experiments upon several real-life large-scale datasets demonstrate that, our proposed system can accelerate the generative adversarial model training by 7.1x. Using around 7.6% samples, SCIS yields competitive accuracy with the state-of-the-art imputation methods in a much shorter computation time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes