LGMLFeb 14, 2012

Semi-supervised Learning with Density Based Distances

arXiv:1202.3702v136 citations
AI Analysis

This addresses the problem of scaling semi-supervised learning to large datasets for practitioners, though it is incremental as it builds on existing graph-based and distance-based techniques.

The paper tackles semi-supervised learning by introducing density-based distances via graph shortest paths, enabling use in distance-based methods like Nearest Neighbor and SVMs, and shows significant runtime improvements over Laplacian regularization on large datasets.

We present a simple, yet effective, approach to Semi-Supervised Learning. Our approach is based on estimating density-based distances (DBD) using a shortest path calculation on a graph. These Graph-DBD estimates can then be used in any distance-based supervised learning method, such as Nearest Neighbor methods and SVMs with RBF kernels. In order to apply the method to very large data sets, we also present a novel algorithm which integrates nearest neighbor computations into the shortest path search and can find exact shortest paths even in extremely large dense graphs. Significant runtime improvement over the commonly used Laplacian regularization method is then shown on a large scale dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes