Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder
This addresses the problem of limited data for training classifiers in domains like medical imaging, though it is an incremental improvement over existing VAE methods.
The paper tackles data augmentation in high-dimensional, low-sample-size settings by proposing a geometry-based variational autoencoder that improves classification metrics, such as increasing balanced accuracy from 66.3% to 74.3% and from 77.7% to 86.3% in medical imaging tasks.
In this paper, we propose a new method to perform data augmentation in a reliable way in the High Dimensional Low Sample Size (HDLSS) setting using a geometry-based variational autoencoder. Our approach combines a proper latent space modeling of the VAE seen as a Riemannian manifold with a new generation scheme which produces more meaningful samples especially in the context of small data sets. The proposed method is tested through a wide experimental study where its robustness to data sets, classifiers and training samples size is stressed. It is also validated on a medical imaging classification task on the challenging ADNI database where a small number of 3D brain MRIs are considered and augmented using the proposed VAE framework. In each case, the proposed method allows for a significant and reliable gain in the classification metrics. For instance, balanced accuracy jumps from 66.3% to 74.3% for a state-of-the-art CNN classifier trained with 50 MRIs of cognitively normal (CN) and 50 Alzheimer disease (AD) patients and from 77.7% to 86.3% when trained with 243 CN and 210 AD while improving greatly sensitivity and specificity metrics.