Revisiting clustering as matrix factorisation on the Stiefel manifold
This work addresses clustering problems in domains like images and gene expression, but it appears incremental as it builds on existing Burer-Monteiro factorization and Stiefel manifold methods.
The paper tackles clustering of high-dimensional data by reformulating it as low-rank matrix estimation within the PAC-Bayesian framework, proposing a new generalized Bayesian estimator and proving novel prediction bounds for clustering.
This paper studies clustering for possibly high dimensional data (e.g. images, time series, gene expression data, and many other settings), and rephrase it as low rank matrix estimation in the PAC-Bayesian framework. Our approach leverages the well known Burer-Monteiro factorisation strategy from large scale optimisation, in the context of low rank estimation. Moreover, our Burer-Monteiro factors are shown to lie on a Stiefel manifold. We propose a new generalized Bayesian estimator for this problem and prove novel prediction bounds for clustering. We also devise a componentwise Langevin sampler on the Stiefel manifold to compute this estimator.