Deep Active Learning over the Long Tail
This work addresses the challenge of efficient data labeling in deep learning, particularly for long-tail distributions, offering a method that reduces sample complexity compared to existing techniques.
The paper tackles the problem of pool-based active learning for deep neural networks by introducing a novel algorithm that queries consecutive points using farthest-first traversals in neural activation space, achieving consistent and overwhelming improvement in sample complexity over passive learning and outperforming uncertainty sampling on MNIST, CIFAR-10, and CIFAR-100 datasets.
This paper is concerned with pool-based active learning for deep neural networks. Motivated by coreset dataset compression ideas, we present a novel active learning algorithm that queries consecutive points from the pool using farthest-first traversals in the space of neural activation over a representation layer. We show consistent and overwhelming improvement in sample complexity over passive learning (random sampling) for three datasets: MNIST, CIFAR-10, and CIFAR-100. In addition, our algorithm outperforms the traditional uncertainty sampling technique (obtained using softmax activations), and we identify cases where uncertainty sampling is only slightly better than random sampling.