LGDSOct 18, 2022

Clustering Categorical Data: Soft Rounding k-modes

arXiv:2210.09640v39 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses a specific limitation in categorical data clustering for researchers and practitioners, offering an incremental improvement over the widely used k-modes algorithm.

The authors tackled the problem of poor performance of the classical k-modes algorithm in a generative block model for categorical data clustering, proposing a soft rounding variant (SoftModes) that theoretically addresses these drawbacks and empirically performs well on synthetic and real-world datasets.

Over the last three decades, researchers have intensively explored various clustering tools for categorical data analysis. Despite the proposal of various clustering algorithms, the classical k-modes algorithm remains a popular choice for unsupervised learning of categorical data. Surprisingly, our first insight is that in a natural generative block model, the k-modes algorithm performs poorly for a large range of parameters. We remedy this issue by proposing a soft rounding variant of the k-modes algorithm (SoftModes) and theoretically prove that our variant addresses the drawbacks of the k-modes algorithm in the generative model. Finally, we empirically verify that SoftModes performs well on both synthetic and real-world datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes