Sampling-enabled scalable manifold learning unveils the discriminative cluster structure of high-dimensional data
This work addresses scalability and cluster distortion issues in manifold learning for applications like single-cell data analysis and ECG anomaly detection, representing an incremental improvement over existing methods.
The paper tackles the problem of distortions in cluster structure and scalability limitations in manifold learning for high-dimensional data by proposing SUDE, a sampling-based technique that achieves improved cluster separation and integrity while maintaining robustness with reduced sampling rates.
As a pivotal branch of machine learning, manifold learning uncovers the intrinsic low-dimensional structure within complex nonlinear manifolds in high-dimensional space for visualization, classification, clustering, and gaining key insights. Although existing techniques have achieved remarkable successes, they suffer from extensive distortions of cluster structure, which hinders the understanding of underlying patterns. Scalability issues also limit their applicability for handling large-scale data. We hence propose a sampling-based Scalable manifold learning technique that enables Uniform and Discriminative Embedding, namely SUDE, for large-scale and high-dimensional data. It starts by seeking a set of landmarks to construct the low-dimensional skeleton of the entire data, and then incorporates the non-landmarks into the learned space based on the constrained locally linear embedding (CLLE). We empirically validated the effectiveness of SUDE on synthetic datasets and real-world benchmarks, and applied it to analyze single-cell data and detect anomalies in electrocardiogram (ECG) signals. SUDE exhibits distinct advantage in scalability with respect to data size and embedding dimension, and has promising performance in cluster separation, integrity, and global structure preservation. The experiments also demonstrate notable robustness in embedding quality as the sampling rate decreases.