LG SP MLApr 28, 2013

Semi-supervised Eigenvectors for Large-scale Locally-biased Learning

arXiv:1304.7528v15 citations

Originality Incremental advance

AI Analysis

This addresses the problem of local data analysis in large-scale settings for researchers and practitioners using eigenvector-based methods, representing an incremental improvement by adapting global eigenvectors to incorporate semi-supervised information.

The paper tackles the challenge of performing locally-biased machine learning tasks, such as clustering or partitioning near a seed set, by developing a methodology to construct semi-supervised eigenvectors of a graph Laplacian that capture orthogonal directions of variance correlated with the seed set, enabling efficient computation and application in various empirical examples.

In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks "nearby" that prespecified target region. For example, one might be interested in the clustering structure of a data graph near a prespecified "seed set" of nodes, or one might be interested in finding partitions in an image that are near a prespecified "ground truth" set of pixels. Locally-biased problems of this sort are particularly challenging for popular eigenvector-based machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities, thus limiting the applicability of eigenvector-based methods in situations where one is interested in very local properties of the data. In this paper, we address this issue by providing a methodology to construct semi-supervised eigenvectors of a graph Laplacian, and we illustrate how these locally-biased eigenvectors can be used to perform locally-biased machine learning. These semi-supervised eigenvectors capture successively-orthogonalized directions of maximum variance, conditioned on being well-correlated with an input seed set of nodes that is assumed to be provided in a semi-supervised manner. We show that these semi-supervised eigenvectors can be computed quickly as the solution to a system of linear equations; and we also describe several variants of our basic method that have improved scaling properties. We provide several empirical examples demonstrating how these semi-supervised eigenvectors can be used to perform locally-biased learning; and we discuss the relationship between our results and recent machine learning algorithms that use global eigenvectors of the graph Laplacian.

View on arXiv PDF

Similar