Data-level hybrid strategy selection for disk fault prediction model based on multivariate GAN
This addresses disk fault prediction for storage systems, but it is incremental as it combines existing techniques (GANs and genetic algorithms) without introducing a fundamentally new approach.
The paper tackled data class imbalance in disk fault prediction using SMART data by mixing synthetic data from multivariate GANs and genetic algorithms to balance the dataset, achieving higher classification accuracy for a specific model.
Data class imbalance is a common problem in classification problems, where minority class samples are often more important and more costly to misclassify in a classification task. Therefore, it is very important to solve the data class imbalance classification problem. The SMART dataset exhibits an evident class imbalance, comprising a substantial quantity of healthy samples and a comparatively limited number of defective samples. This dataset serves as a reliable indicator of the disc's health status. In this paper, we obtain the best balanced disk SMART dataset for a specific classification model by mixing and integrating the data synthesised by multivariate generative adversarial networks (GAN) to balance the disk SMART dataset at the data level; and combine it with genetic algorithms to obtain higher disk fault classification prediction accuracy on a specific classification model.