Imputation of Missing Data with Class Imbalance using Conditional Generative Adversarial Networks
This work is an incremental improvement for data scientists and researchers dealing with missing data in imbalanced datasets, offering a more accurate imputation method.
This paper addresses the problem of missing data imputation, particularly in datasets with class imbalance. The authors propose a new method, Conditional Generative Adversarial Imputation Network (CGAIN), which imputes missing data using class-specific distributions, achieving superior performance compared to state-of-the-art methods on benchmark datasets.
Missing data is a common problem faced with real-world datasets. Imputation is a widely used technique to estimate the missing data. State-of-the-art imputation approaches, such as Generative Adversarial Imputation Nets (GAIN), model the distribution of observed data to approximate the missing values. Such an approach usually models a single distribution for the entire dataset, which overlooks the class-specific characteristics of the data. Class-specific characteristics are especially useful when there is a class imbalance. We propose a new method for imputing missing data based on its class-specific characteristics by adapting the popular Conditional Generative Adversarial Networks (CGAN). Our Conditional Generative Adversarial Imputation Network (CGAIN) imputes the missing data using class-specific distributions, which can produce the best estimates for the missing values. We tested our approach on benchmark datasets and achieved superior performance compared with the state-of-the-art and popular imputation approaches.