Class-Balanced Active Learning for Image Classification
This addresses the challenge of active learning for image classification in real-world, long-tail distributed datasets, offering an incremental improvement by integrating class-balancing into existing methods.
The paper tackles the problem of active learning on imbalanced datasets by proposing a class-balanced optimization framework, showing performance gains on both imbalanced and balanced datasets across three datasets.
Active learning aims to reduce the labeling effort that is required to train algorithms by learning an acquisition function selecting the most relevant data for which a label should be requested from a large unlabeled data pool. Active learning is generally studied on balanced datasets where an equal amount of images per class is available. However, real-world datasets suffer from severe imbalanced classes, the so called long-tail distribution. We argue that this further complicates the active learning process, since the imbalanced data pool can result in suboptimal classifiers. To address this problem in the context of active learning, we proposed a general optimization framework that explicitly takes class-balancing into account. Results on three datasets showed that the method is general (it can be combined with most existing active learning algorithms) and can be effectively applied to boost the performance of both informative and representative-based active learning methods. In addition, we showed that also on balanced datasets our method generally results in a performance gain.