Spectral Metric for Dataset Complexity Assessment
This provides a tool for researchers and practitioners to evaluate dataset complexity, enabling applications like dataset reduction and accuracy approximation, though it is incremental as it builds on existing complexity measures.
The paper tackles the problem of assessing the complexity of image classification datasets by proposing a new measure called the cumulative spectral gradient (CSG), which strongly correlates with CNN test accuracy and is shown to be more accurate and faster than previous methods on 11 datasets and three CNN models.
In this paper, we propose a new measure to gauge the complexity of image classification problems. Given an annotated image dataset, our method computes a complexity measure called the cumulative spectral gradient (CSG) which strongly correlates with the test accuracy of convolutional neural networks (CNN). The CSG measure is derived from the probabilistic divergence between classes in a spectral clustering framework. We show that this metric correlates with the overall separability of the dataset and thus its inherent complexity. As will be shown, our metric can be used for dataset reduction, to assess which classes are more difficult to disentangle, and approximate the accuracy one could expect to get with a CNN. Results obtained on 11 datasets and three CNN models reveal that our method is more accurate and faster than previous complexity measures.