Cold Start Active Learning Strategies in the Context of Imbalanced Classification
This work addresses the challenge of initializing classification with no labels in imbalanced datasets, which is incremental as it builds on existing active learning methods.
The paper tackles the cold start problem in active learning for imbalanced classification by proposing strategies that combine clustering and label propagation to address label scarcity and class imbalance, demonstrating effectiveness in boosting recall for the minority class in a Twitter case study on flood event testimonies.
We present novel active learning strategies dedicated to providing a solution to the cold start stage, i.e. initializing the classification of a large set of data with no attached labels. Moreover, proposed strategies are designed to handle an imbalanced context in which random selection is highly inefficient. Specifically, our active learning iterations address label scarcity and imbalance using element scores, combining information extracted from a clustering structure to a label propagation model. The strategy is illustrated by a case study on annotating Twitter content w.r.t. testimonies of a real flood event. We show that our method effectively copes with class imbalance, by boosting the recall of samples from the minority class.