LGSep 24, 2024

Self-Supervised Graph Embedding Clustering

Fangfang Li, Quanxue Gao, Cheng Deng, Wei Xia

arXiv:2409.15887v22.62 citationsh-index: 17

Originality Incremental advance

AI Analysis

This work addresses clustering challenges in machine learning by improving class balance and reducing hyperparameter dependency, though it appears incremental as it builds on existing K-means and manifold learning methods.

The paper tackles the limitations of K-means one-step clustering by proposing a self-supervised graph embedding framework that integrates manifold learning with K-means, eliminating the need for centroids and balancing hyperparameters, and experiments show excellent performance on multiple datasets.

The K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks. However, it combines the K-means clustering and dimensionality reduction processes for optimization, leading to limitations in the clustering effect due to the introduced hyperparameters and the initialization of clustering centers. Moreover, maintaining class balance during clustering remains challenging. To overcome these issues, we propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework. Specifically, we establish a connection between K-means and the manifold structure, allowing us to perform K-means without explicitly defining centroids. Additionally, we use this centroid-free K-means to generate labels in low-dimensional space and subsequently utilize the label information to determine the similarity between samples. This approach ensures consistency between the manifold structure and the labels. Our model effectively achieves one-step clustering without the need for redundant balancing hyperparameters. Notably, we have discovered that maximizing the $\ell_{2,1}$-norm naturally maintains class balance during clustering, a result that we have theoretically proven. Finally, experiments on multiple datasets demonstrate that the clustering results of Our-LPP and Our-MFA exhibit excellent and reliable performance.

View on arXiv PDF

Similar