Mixture of von Mises-Fisher distribution with sparse prototypes
This provides an interpretable clustering method for high-dimensional directional data such as text, though it appears incremental as it builds on existing von Mises-Fisher mixtures with sparsity regularization.
The authors tackled clustering of high-dimensional directional data like text by proposing a mixture of von Mises-Fisher distributions with L1-penalized likelihood estimation, resulting in sparse prototypes that improve clustering interpretability, as demonstrated on simulated data, real benchmarks, and a new financial reports dataset.
Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis.