A Two-Stage Active Learning Algorithm for $k$-Nearest Neighbors
This addresses a gap in machine learning for researchers and practitioners needing efficient active learning methods for k-nearest neighbor classification, though it is incremental as it builds on existing theory.
The paper tackles the lack of active learning strategies for k-nearest neighbor classifiers by introducing a two-stage algorithm that retains the k-nearest neighbor voting concept, proving it achieves faster asymptotic convergence to the Bayes optimal classifier under smoothness and noise conditions compared to passive training.
$k$-nearest neighbor classification is a popular non-parametric method because of desirable properties like automatic adaption to distributional scale changes. Unfortunately, it has thus far proved difficult to design active learning strategies for the training of local voting-based classifiers that naturally retain these desirable properties, and hence active learning strategies for $k$-nearest neighbor classification have been conspicuously missing from the literature. In this work, we introduce a simple and intuitive active learning algorithm for the training of $k$-nearest neighbor classifiers, the first in the literature which retains the concept of the $k$-nearest neighbor vote at prediction time. We provide consistency guarantees for a modified $k$-nearest neighbors classifier trained on samples acquired via our scheme, and show that when the conditional probability function $\mathbb{P}(Y=y|X=x)$ is sufficiently smooth and the Tsybakov noise condition holds, our actively trained classifiers converge to the Bayes optimal classifier at a faster asymptotic rate than passively trained $k$-nearest neighbor classifiers.