MLJan 30, 2014

Sparse Bayesian Unsupervised Learning

arXiv:1401.8017v17 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of high-dimensional data analysis for researchers and practitioners, offering a theoretical guarantee for optimal parameter selection, though it appears incremental in its method.

The paper tackles unsupervised learning in high-dimensional settings by developing a sparse Bayesian method for variable selection, clustering, and estimation, proving a sparsity oracle inequality that ensures optimal selection of cluster numbers and relevant variables.

This paper is about variable selection, clustering and estimation in an unsupervised high-dimensional setting. Our approach is based on fitting constrained Gaussian mixture models, where we learn the number of clusters $K$ and the set of relevant variables $S$ using a generalized Bayesian posterior with a sparsity inducing prior. We prove a sparsity oracle inequality which shows that this procedure selects the optimal parameters $K$ and $S$. This procedure is implemented using a Metropolis-Hastings algorithm, based on a clustering-oriented greedy proposal, which makes the convergence to the posterior very fast.

View on arXiv PDF

Similar