Learning by Active Nonlinear Diffusion
This addresses the challenge of efficient labeling in high-dimensional, noisy datasets for applications like remote sensing, though it appears incremental as it builds on existing diffusion-based methods.
The paper tackles the problem of active learning for high-dimensional data by using diffusion processes on graphs to capture intrinsic geometric structures, achieving high-accuracy labeling with only a small number of carefully selected labels. It demonstrates competitive empirical performance on synthetic datasets and real hyperspectral remote sensing images.
This article proposes an active learning method for high dimensional data, based on intrinsic data geometries learned through diffusion processes on graphs. Diffusion distances are used to parametrize low-dimensional structures on the dataset, which allow for high-accuracy labelings of the dataset with only a small number of carefully chosen labels. The geometric structure of the data suggests regions that have homogeneous labels, as well as regions with high label complexity that should be queried for labels. The proposed method enjoys theoretical performance guarantees on a general geometric data model, in which clusters corresponding to semantically meaningful classes are permitted to have nonlinear geometries, high ambient dimensionality, and suffer from significant noise and outlier corruption. The proposed algorithm is implemented in a manner that is quasilinear in the number of unlabeled data points, and exhibits competitive empirical performance on synthetic datasets and real hyperspectral remote sensing images.