LG ST MLDec 30, 2022

Mixture of von Mises-Fisher distribution with sparse prototypes

arXiv:2212.14591v13.35 citationsh-index: 4

Originality Incremental advance

AI Analysis

This provides an interpretable clustering method for high-dimensional directional data such as text, though it appears incremental as it builds on existing von Mises-Fisher mixtures with sparsity regularization.

The authors tackled clustering of high-dimensional directional data like text by proposing a mixture of von Mises-Fisher distributions with L1-penalized likelihood estimation, resulting in sparse prototypes that improve clustering interpretability, as demonstrated on simulated data, real benchmarks, and a new financial reports dataset.

Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis.

View on arXiv PDF

Similar