Manifold learning with arbitrary norms
This work provides a foundational theoretical generalization for manifold learning, potentially improving the accuracy and efficiency of dimensionality reduction for researchers working with high-dimensional data where non-Euclidean metrics are more appropriate.
This paper generalizes the theory of graph Laplacians in manifold learning to arbitrary norms, moving beyond the traditional Euclidean assumption. They demonstrate that a modified Laplacian eigenmaps algorithm using Earthmover's distance outperforms the classic Euclidean version in molecular motion mapping, achieving better computational cost and requiring fewer samples to recover intrinsic geometry.
Manifold learning methods play a prominent role in nonlinear dimensionality reduction and other tasks involving high-dimensional data sets with low intrinsic dimensionality. Many of these methods are graph-based: they associate a vertex with each data point and a weighted edge with each pair. Existing theory shows that the Laplacian matrix of the graph converges to the Laplace-Beltrami operator of the data manifold, under the assumption that the pairwise affinities are based on the Euclidean norm. In this paper, we determine the limiting differential operator for graph Laplacians constructed using $\textit{any}$ norm. Our proof involves an interplay between the second fundamental form of the manifold and the convex geometry of the given norm's unit ball. To demonstrate the potential benefits of non-Euclidean norms in manifold learning, we consider the task of mapping the motion of large molecules with continuous variability. In a numerical simulation we show that a modified Laplacian eigenmaps algorithm, based on the Earthmover's distance, outperforms the classic Euclidean Laplacian eigenmaps, both in terms of computational cost and the sample size needed to recover the intrinsic geometry.