LG CVNov 23, 2023

Class Uncertainty: A Measure to Mitigate Class Imbalance

Z. S. Baltaci, K. Oksuz, S. Kuzucu, K. Tezoren, B. K. Konar, A. Ozkan, E. Akbas, S. Kalkan

arXiv:2311.14090v27.76 citationsh-index: 6

Originality Incremental advance

AI Analysis

This work addresses class imbalance for machine learning practitioners by introducing a novel measure that handles cases where equal class sizes still lead to imbalance, though it is incremental as it builds on existing mitigation methods.

The paper tackles the class imbalance problem in deep learning by proposing 'Class Uncertainty' as a measure that better captures class differences than traditional cardinality-based methods, and demonstrates its effectiveness on long-tailed datasets and a new curated dataset SVCI-20, showing improved performance across ten mitigation methods.

Class-wise characteristics of training examples affect the performance of deep classifiers. A well-studied example is when the number of training examples of classes follows a long-tailed distribution, a situation that is likely to yield sub-optimal performance for under-represented classes. This class imbalance problem is conventionally addressed by approaches relying on the class-wise cardinality of training examples, such as data resampling. In this paper, we demonstrate that considering solely the cardinality of classes does not cover all issues causing class imbalance. To measure class imbalance, we propose "Class Uncertainty" as the average predictive uncertainty of the training examples, and we show that this novel measure captures the differences across classes better than cardinality. We also curate SVCI-20 as a novel dataset in which the classes have equal number of training examples but they differ in terms of their hardness; thereby causing a type of class imbalance which cannot be addressed by the approaches relying on cardinality. We incorporate our "Class Uncertainty" measure into a diverse set of ten class imbalance mitigation methods to demonstrate its effectiveness on long-tailed datasets as well as on our SVCI-20. Code and datasets will be made available.

View on arXiv PDF

Similar