LGNov 2, 2017

Deep Active Learning over the Long Tail

arXiv:1711.00941v1161 citations
Originality Highly original
AI Analysis

This work addresses the challenge of efficient data labeling in deep learning, particularly for long-tail distributions, offering a method that reduces sample complexity compared to existing techniques.

The paper tackles the problem of pool-based active learning for deep neural networks by introducing a novel algorithm that queries consecutive points using farthest-first traversals in neural activation space, achieving consistent and overwhelming improvement in sample complexity over passive learning and outperforming uncertainty sampling on MNIST, CIFAR-10, and CIFAR-100 datasets.

This paper is concerned with pool-based active learning for deep neural networks. Motivated by coreset dataset compression ideas, we present a novel active learning algorithm that queries consecutive points from the pool using farthest-first traversals in the space of neural activation over a representation layer. We show consistent and overwhelming improvement in sample complexity over passive learning (random sampling) for three datasets: MNIST, CIFAR-10, and CIFAR-100. In addition, our algorithm outperforms the traditional uncertainty sampling technique (obtained using softmax activations), and we identify cases where uncertainty sampling is only slightly better than random sampling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes