LGApr 30, 2021

Divide-and-conquer based Large-Scale Spectral Clustering

arXiv:2104.15042v224 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the problem of scalable clustering for researchers and practitioners dealing with large datasets, but it appears incremental as it builds on existing spectral clustering techniques.

The paper tackles the challenge of balancing efficiency and effectiveness in large-scale spectral clustering with limited computing resources by proposing a divide-and-conquer method that uses landmark selection and an approximate similarity matrix to reduce computational complexity, achieving lower complexity than most existing methods as demonstrated on ten large-scale datasets.

Spectral clustering is one of the most popular clustering methods. However, how to balance the efficiency and effectiveness of the large-scale spectral clustering with limited computing resources has not been properly solved for a long time. In this paper, we propose a divide-and-conquer based large-scale spectral clustering method to strike a good balance between efficiency and effectiveness. In the proposed method, a divide-and-conquer based landmark selection algorithm and a novel approximate similarity matrix approach are designed to construct a sparse similarity matrix within low computational complexities. Then clustering results can be computed quickly through a bipartite graph partition process. The proposed method achieves a lower computational complexity than most existing large-scale spectral clustering methods. Experimental results on ten large-scale datasets have demonstrated the efficiency and effectiveness of the proposed method. The MATLAB code of the proposed method and experimental datasets are available at https://github.com/Li-Hongmin/MyPaperWithCode.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes