Dataset Complexity Assessment Based on Cumulative Maximum Scaled Area Under Laplacian Spectrum
This addresses the time-consuming training process for deep learning models by enabling pre-training performance prediction, which is incremental as it builds on existing complexity assessment methods.
The paper tackles the problem of predicting classification performance before training deep convolutional neural networks by assessing dataset complexity, proposing a novel method called cmsAULS that achieves state-of-the-art performance on six datasets.
Dataset complexity assessment aims to predict classification performance on a dataset with complexity calculation before training a classifier, which can also be used for classifier selection and dataset reduction. The training process of deep convolutional neural networks (DCNNs) is iterative and time-consuming because of hyperparameter uncertainty and the domain shift introduced by different datasets. Hence, it is meaningful to predict classification performance by assessing the complexity of datasets effectively before training DCNN models. This paper proposes a novel method called cumulative maximum scaled Area Under Laplacian Spectrum (cmsAULS), which can achieve state-of-the-art complexity assessment performance on six datasets.