LGSTMLDec 30, 2022

Mixture of von Mises-Fisher distribution with sparse prototypes

arXiv:2212.14591v15 citationsh-index: 4
Originality Incremental advance
AI Analysis

This provides an interpretable clustering method for high-dimensional directional data such as text, though it appears incremental as it builds on existing von Mises-Fisher mixtures with sparsity regularization.

The authors tackled clustering of high-dimensional directional data like text by proposing a mixture of von Mises-Fisher distributions with L1-penalized likelihood estimation, resulting in sparse prototypes that improve clustering interpretability, as demonstrated on simulated data, real benchmarks, and a new financial reports dataset.

Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes