LGMLDec 16, 2020

Clustering Ensemble Meets Low-rank Tensor Approximation

arXiv:2012.08916v142 citations
AI Analysis

This work provides a novel approach to improve clustering ensemble performance for machine learning practitioners by mitigating the dominance of poor base clusterings.

This paper addresses the problem of clustering ensemble, which combines multiple base clusterings to improve performance. The authors propose a novel low-rank tensor approximation method that stacks a coherent-link matrix and a co-association matrix to form a 3D tensor, refining the co-association matrix. This method achieves a breakthrough in clustering performance compared to 12 state-of-the-art methods across 7 benchmark datasets.

This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one. The existing clustering ensemble methods generally construct a co-association matrix, which indicates the pairwise similarity between samples, as the weighted linear combination of the connective matrices from different base clusterings, and the resulting co-association matrix is then adopted as the input of an off-the-shelf clustering algorithm, e.g., spectral clustering. However, the co-association matrix may be dominated by poor base clusterings, resulting in inferior performance. In this paper, we propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective. Specifically, by inspecting whether two samples are clustered to an identical cluster under different base clusterings, we derive a coherent-link matrix, which contains limited but highly reliable relationships between samples. We then stack the coherent-link matrix and the co-association matrix to form a three-dimensional tensor, the low-rankness property of which is further explored to propagate the information of the coherent-link matrix to the co-association matrix, producing a refined co-association matrix. We formulate the proposed method as a convex constrained optimization problem and solve it efficiently. Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods. To the best of our knowledge, this is the first work to explore the potential of low-rank tensor on clustering ensemble, which is fundamentally different from previous approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes