Cautious Active Clustering
This addresses the challenge of efficient label querying for classification in scenarios with unknown distributions, offering a method that is incremental in its approach to active learning.
The paper tackles the problem of classifying points from an unknown probability measure by querying very few labels, using a localized kernel based on Hermite polynomials to estimate class supports without assumptions on the measures or number of classes. It provides theoretical guarantees measured by the F-score, with examples including hyper-spectral images and MNIST classification.
We consider the problem of classification of points sampled from an unknown probability measure on a Euclidean space. We study the question of querying the class label at a very small number of judiciously chosen points so as to be able to attach the appropriate class label to every point in the set. Our approach is to consider the unknown probability measure as a convex combination of the conditional probabilities for each class. Our technique involves the use of a highly localized kernel constructed from Hermite polynomials, in order to create a hierarchical estimate of the supports of the constituent probability measures. We do not need to make any assumptions on the nature of any of the probability measures nor know in advance the number of classes involved. We give theoretical guarantees measured by the $F$-score for our classification scheme. Examples include classification in hyper-spectral images and MNIST classification.