LG AIDec 19, 2023

Automatic Parameter Selection for Non-Redundant Clustering

Collin Leiber, Dominik Mautz, Claudia Plant, Christian Böhm

arXiv:2312.11952v23.81 citationsh-index: 29SDM

Originality Incremental advance

AI Analysis

This addresses the challenge of parameter specification in clustering for users dealing with multi-view data, though it is incremental as it builds on existing non-redundant clustering approaches.

The paper tackles the problem of automatically selecting parameters for non-redundant clustering in high-dimensional datasets, proposing a framework based on the Minimum Description Length Principle that efficiently detects subspaces, clusters per subspace, and outliers, with experiments showing it is highly competitive to state-of-the-art methods.

High-dimensional datasets often contain multiple meaningful clusterings in different subspaces. For example, objects can be clustered either by color, weight, or size, revealing different interpretations of the given dataset. A variety of approaches are able to identify such non-redundant clusterings. However, most of these methods require the user to specify the expected number of subspaces and clusters for each subspace. Stating these values is a non-trivial problem and usually requires detailed knowledge of the input dataset. In this paper, we propose a framework that utilizes the Minimum Description Length Principle (MDL) to detect the number of subspaces and clusters per subspace automatically. We describe an efficient procedure that greedily searches the parameter space by splitting and merging subspaces and clusters within subspaces. Additionally, an encoding strategy is introduced that allows us to detect outliers in each subspace. Extensive experiments show that our approach is highly competitive to state-of-the-art methods.

View on arXiv PDF

Similar