CVJan 19, 2022

PT4AL: Using Self-Supervised Pretext Tasks for Active Learning

arXiv:2201.07459v349 citations
AI Analysis

This addresses the data labeling cost problem for machine learning practitioners, offering an incremental improvement by integrating self-supervised learning into active learning.

The paper tackles the problem of expensive data labeling by proposing a novel active learning approach that uses self-supervised pretext tasks to select informative data, achieving compelling performances on benchmarks like CIFAR10, Caltech-101, ImageNet, and Cityscapes.

Labeling a large set of data is expensive. Active learning aims to tackle this problem by asking to annotate only the most informative data from the unlabeled set. We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative. We discover that the loss of a simple self-supervised pretext task, such as rotation prediction, is closely correlated to the downstream task loss. Before the active learning iterations, the pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and split into batches by their pretext task losses. In each active learning iteration, the main task model is used to sample the most uncertain data in a batch to be annotated. We evaluate our method on various image classification and segmentation benchmarks and achieve compelling performances on CIFAR10, Caltech-101, ImageNet, and Cityscapes. We further show that our method performs well on imbalanced datasets, and can be an effective solution to the cold-start problem where active learning performance is affected by the randomly sampled initial labeled set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes