CVMar 14, 2022

Active Learning by Feature Mixing

arXiv:2203.07034v1126 citationsh-index: 80
Originality Highly original
AI Analysis

This addresses the problem of reducing labeling costs in machine learning, especially for image, video, and non-visual data, representing a strong specific gain but not a new paradigm.

The paper tackles the challenge of selecting valuable examples for annotation in active learning with high-dimensional data, proposing ALFA-Mix, which outperforms recent approaches in 30 settings across 12 benchmarks, with significant gains in low-data regimes and on self-trained vision transformers.

The promise of active learning (AL) is to reduce labelling costs by selecting the most valuable examples to annotate from a pool of unlabelled data. Identifying these examples is especially challenging with high-dimensional data (e.g. images, videos) and in low-data regimes. In this paper, we propose a novel method for batch AL called ALFA-Mix. We identify unlabelled instances with sufficiently-distinct features by seeking inconsistencies in predictions resulting from interventions on their representations. We construct interpolations between representations of labelled and unlabelled instances then examine the predicted labels. We show that inconsistencies in these predictions help discovering features that the model is unable to recognise in the unlabelled instances. We derive an efficient implementation based on a closed-form solution to the optimal interpolation causing changes in predictions. Our method outperforms all recent AL approaches in 30 different settings on 12 benchmarks of images, videos, and non-visual data. The improvements are especially significant in low-data regimes and on self-trained vision transformers, where ALFA-Mix outperforms the state-of-the-art in 59% and 43% of the experiments respectively.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes