Towards Stable Imbalanced Data Classification via Virtual Big Data Projection
This work addresses challenges in machine learning for researchers dealing with imbalanced datasets and autoencoder training, presenting a novel projection-based method that is incremental in extending VBD's applications.
The paper tackles the problem of imbalanced data classification and deep autoencoder training by applying Virtual Big Data (VBD) to reduce over-fitting and balance skewed class distributions without over-sampling, achieving significant decreases in validation loss and solving uncertainty issues in data-driven methods.
Virtual Big Data (VBD) proved to be effective to alleviate mode collapse and vanishing generator gradient as two major problems of Generative Adversarial Neural Networks (GANs) very recently. In this paper, we investigate the capability of VBD to address two other major challenges in Machine Learning including deep autoencoder training and imbalanced data classification. First, we prove that, VBD can significantly decrease the validation loss of autoencoders via providing them a huge diversified training data which is the key to reach better generalization to minimize the over-fitting problem. Second, we use the VBD to propose the first projection-based method called cross-concatenation to balance the skewed class distributions without over-sampling. We prove that, cross-concatenation can solve uncertainty problem of data driven methods for imbalanced classification.