MLLGCOOct 24, 2017

A Bayesian Method for Joint Clustering of Vectorial Data and Network Data

arXiv:1710.08846v1
Originality Incremental advance
AI Analysis

This method addresses the challenge of integrating multiple data types for clustering, which is incremental as it builds on existing models like Gaussian mixture and stochastic block models.

The authors tackled the problem of clustering objects with both vectorial and network data by developing a Bayesian model-based method that simultaneously clusters both data types within one integrative probabilistic model, showing it performs much better than alternative methods on synthetic and real data.

We present a new model-based integrative method for clustering objects given both vectorial data, which describes the feature of each object, and network data, which indicates the similarity of connected objects. The proposed general model is able to cluster the two types of data simultaneously within one integrative probabilistic model, while traditional methods can only handle one data type or depend on transforming one data type to another. Bayesian inference of the clustering is conducted based on a Markov chain Monte Carlo algorithm. A special case of the general model combining the Gaussian mixture model and the stochastic block model is extensively studied. We used both synthetic data and real data to evaluate this new method and compare it with alternative methods. The results show that our simultaneous clustering method performs much better. This improvement is due to the power of the model-based probabilistic approach for efficiently integrating information.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes