Clustering by Low-Rank Doubly Stochastic Matrix Decomposition
This work addresses clustering analysis for data scientists, offering a novel method that is incremental in advancing nonnegative low-rank approximations.
The authors tackled the problem of improving clustering performance beyond traditional matrix factorization methods by proposing a low-rank doubly stochastic matrix decomposition approach, which achieved strong clustering purity results, particularly on large-scale manifold datasets.
Clustering analysis by nonnegative low-rank approximations has achieved remarkable progress in the past decade. However, most approximation approaches in this direction are still restricted to matrix factorization. We propose a new low-rank learning method to improve the clustering performance, which is beyond matrix factorization. The approximation is based on a two-step bipartite random walk through virtual cluster nodes, where the approximation is formed by only cluster assigning probabilities. Minimizing the approximation error measured by Kullback-Leibler divergence is equivalent to maximizing the likelihood of a discriminative model, which endows our method with a solid probabilistic interpretation. The optimization is implemented by a relaxed Majorization-Minimization algorithm that is advantageous in finding good local minima. Furthermore, we point out that the regularized algorithm with Dirichlet prior only serves as initialization. Experimental results show that the new method has strong performance in clustering purity for various datasets, especially for large-scale manifold data.