Pool-Based Active Learning with Proper Topological Regions
This work addresses the challenge of reducing labeling costs in machine learning applications where large labeled datasets are difficult to obtain, though it appears incremental as it builds on existing pool-based active learning methods.
The paper tackles the problem of selecting the most informative unlabeled data points for training in multi-class classification tasks by proposing a meta-approach based on Proper Topological Regions (PTR) derived from topological data analysis. The result is a method that is empirically competitive with classical active learning strategies on benchmark datasets.
Machine learning methods usually rely on large sample size to have good performance, while it is difficult to provide labeled set in many applications. Pool-based active learning methods are there to detect, among a set of unlabeled data, the ones that are the most relevant for the training. We propose in this paper a meta-approach for pool-based active learning strategies in the context of multi-class classification tasks based on Proper Topological Regions. PTR, based on topological data analysis (TDA), are relevant regions used to sample cold-start points or within the active learning scheme. The proposed method is illustrated empirically on various benchmark datasets, being competitive to the classical methods from the literature.