ML LGMar 21, 2019

Latent Simplex Position Model: High Dimensional Multi-view Clustering with Uncertainty Quantification

arXiv:1903.09029v22.21 citations

Originality Incremental advance

AI Analysis

This work addresses uncertainty quantification in multi-view clustering for high-dimensional data, offering improved interpretability in applications like genomics, though it appears incremental as it builds on existing multi-view methods.

The authors tackled the problem of multi-view clustering in high-dimensional data with uncertainty quantification, proposing an approximate Bayes approach that yields interpretable results in a traumatic brain injury study using gene expression data.

High dimensional data often contain multiple facets, and several clustering patterns can co-exist under different variable subspaces, also known as the views. While multi-view clustering algorithms were proposed, the uncertainty quantification remains difficult --- a particular challenge is in the high complexity of estimating the cluster assignment probability under each view, and sharing information among views. In this article, we propose an approximate Bayes approach --- treating the similarity matrices generated over the views as rough first-stage estimates for the co-assignment probabilities; in its Kullback-Leibler neighborhood, we obtain a refined low-rank matrix, formed by the pairwise product of simplex coordinates. Interestingly, each simplex coordinate directly encodes the cluster assignment uncertainty. For multi-view clustering, we let each view draw a parameterization from a few candidates, leading to dimension reduction. With high model flexibility, the estimation can be efficiently carried out as a continuous optimization problem, hence enjoys gradient-based computation. The theory establishes the connection of this model to a random partition distribution under multiple views. Compared to single-view clustering approaches, substantially more interpretable results are obtained when clustering brains from a human traumatic brain injury study, using high-dimensional gene expression data. KEY WORDS: Co-regularized Clustering, Consensus, PAC-Bayes, Random Cluster Graph, Variable Selection

View on arXiv PDF

Similar