LGCVDec 26, 2023

BAL: Balancing Diversity and Novelty for Active Learning

arXiv:2312.15944v118 citationsh-index: 29Has CodeIEEE Trans Pattern Anal Mach Intell
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient data labeling in machine learning, presenting an incremental improvement over existing active learning methods.

The paper tackles the problem of balancing diversity and uncertainty in active learning to maximize performance within a labeling budget, introducing the BAL framework that outperforms established methods by 1.20% on benchmarks and achieves performance comparable to the full dataset when labeling 80% of samples.

The objective of Active Learning is to strategically label a subset of the dataset to maximize performance within a predetermined labeling budget. In this study, we harness features acquired through self-supervised learning. We introduce a straightforward yet potent metric, Cluster Distance Difference, to identify diverse data. Subsequently, we introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data. Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%. Moreover, we assess the efficacy of our proposed framework under extended settings, encompassing both larger and smaller labeling budgets. Experimental results demonstrate that, when labeling 80% of the samples, the performance of the current SOTA method declines by 0.74%, whereas our proposed BAL achieves performance comparable to the full dataset. Codes are available at https://github.com/JulietLJY/BAL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes