MLAILGCOMEFeb 7, 2023

Sparse and geometry-aware generalisation of the mutual information for joint discriminative clustering and feature selection

arXiv:2302.03391v24 citationsh-index: 26
AI Analysis

This addresses feature selection for clustering in high-dimensional data, but it is incremental as it builds on existing mutual information generalizations with a sparsity penalty.

The paper tackles the problem of feature selection in clustering by introducing Sparse GEMINI, a discriminative clustering model that maximizes a geometry-aware generalization of mutual information with an l1 penalty, avoiding combinatorial exploration and scaling to high-dimensional data. Results show it is competitive and selects relevant variable subsets without prior hypotheses.

Feature selection in clustering is a hard task which involves simultaneously the discovery of relevant clusters as well as relevant variables with respect to these clusters. While feature selection algorithms are often model-based through optimised model selection or strong assumptions on the data distribution, we introduce a discriminative clustering model trying to maximise a geometry-aware generalisation of the mutual information called GEMINI with a simple l1 penalty: the Sparse GEMINI. This algorithm avoids the burden of combinatorial feature subset exploration and is easily scalable to high-dimensional data and large amounts of samples while only designing a discriminative clustering model. We demonstrate the performances of Sparse GEMINI on synthetic datasets and large-scale datasets. Our results show that Sparse GEMINI is a competitive algorithm and has the ability to select relevant subsets of variables with respect to the clustering without using relevance criteria or prior hypotheses.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes