A Study on the Predictability of Sample Learning Consistency
This work addresses the challenge of efficiently determining sample difficulty for curriculum learning, though it is incremental as it builds on existing C-Score methods without achieving practical improvements.
The study investigated the predictability of C-Score, a metric for sample difficulty in curriculum learning, by training models on CIFAR datasets, but found poor generalization, suggesting C-Score depends on factors like sample relations rather than individual characteristics.
Curriculum Learning is a powerful training method that allows for faster and better training in some settings. This method, however, requires having a notion of which examples are difficult and which are easy, which is not always trivial to provide. A recent metric called C-Score acts as a proxy for example difficulty by relating it to learning consistency. Unfortunately, this method is quite compute intensive which limits its applicability for alternative datasets. In this work, we train models through different methods to predict C-Score for CIFAR-100 and CIFAR-10. We find, however, that these models generalize poorly both within the same distribution as well as out of distribution. This suggests that C-Score is not defined by the individual characteristics of each sample but rather by other factors. We hypothesize that a sample's relation to its neighbours, in particular, how many of them share the same labels, can help in explaining C-Scores. We plan to explore this in future work.