MLLGMEDec 7, 2022

A parallelizable model-based approach for marginal and multivariate clustering

arXiv:2212.04009v1h-index: 14
Originality Incremental advance
AI Analysis

This work addresses clustering challenges for datasets with varying marginal cluster structures, offering a computationally efficient and parallelizable method, though it appears incremental as it builds on existing model-based approaches.

The paper tackles the problem of model-based clustering's assumption of equal cluster counts per margin by introducing a finite mixture model per margin and a strategy game-inspired algorithm called Reign-and-Conquer, resulting in good performance in numerical experiments on artificial data and application to real datasets.

This paper develops a clustering method that takes advantage of the sturdiness of model-based clustering, while attempting to mitigate some of its pitfalls. First, we note that standard model-based clustering likely leads to the same number of clusters per margin, which seems a rather artificial assumption for a variety of datasets. We tackle this issue by specifying a finite mixture model per margin that allows each margin to have a different number of clusters, and then cluster the multivariate data using a strategy game-inspired algorithm to which we call Reign-and-Conquer. Second, since the proposed clustering approach only specifies a model for the margins -- but leaves the joint unspecified -- it has the advantage of being partially parallelizable; hence, the proposed approach is computationally appealing as well as more tractable for moderate to high dimensions than a `full' (joint) model-based clustering approach. A battery of numerical experiments on artificial data indicate an overall good performance of the proposed methods in a variety of scenarios, and real datasets are used to showcase their application in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes