LGSTMLMay 30, 2019

Learning by Active Nonlinear Diffusion

arXiv:1905.12989v120 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient labeling in high-dimensional, noisy datasets for applications like remote sensing, though it appears incremental as it builds on existing diffusion-based methods.

The paper tackles the problem of active learning for high-dimensional data by using diffusion processes on graphs to capture intrinsic geometric structures, achieving high-accuracy labeling with only a small number of carefully selected labels. It demonstrates competitive empirical performance on synthetic datasets and real hyperspectral remote sensing images.

This article proposes an active learning method for high dimensional data, based on intrinsic data geometries learned through diffusion processes on graphs. Diffusion distances are used to parametrize low-dimensional structures on the dataset, which allow for high-accuracy labelings of the dataset with only a small number of carefully chosen labels. The geometric structure of the data suggests regions that have homogeneous labels, as well as regions with high label complexity that should be queried for labels. The proposed method enjoys theoretical performance guarantees on a general geometric data model, in which clusters corresponding to semantically meaningful classes are permitted to have nonlinear geometries, high ambient dimensionality, and suffer from significant noise and outlier corruption. The proposed algorithm is implemented in a manner that is quasilinear in the number of unlabeled data points, and exhibits competitive empirical performance on synthetic datasets and real hyperspectral remote sensing images.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes