LGMEMLApr 24, 2013

The K-modes algorithm for clustering

arXiv:1304.6478v128 citations
Originality Incremental advance
AI Analysis

This addresses the problem of finding representative cluster centroids in data analysis, particularly for nonconvex clusters, though it appears incremental as it builds on existing methods like K-means and K-medoids.

The paper tackles the lack of clustering algorithms that return exactly K meaningful modes by proposing a K-modes objective function based on density and cluster assignment, resulting in an algorithm that is robust to nonconvex clusters and outliers, with computational speed slightly slower than K-means but much faster than mean-shift or K-medoids.

Many clustering algorithms exist that estimate a cluster centroid, such as K-means, K-medoids or mean-shift, but no algorithm seems to exist that clusters data by returning exactly K meaningful modes. We propose a natural definition of a K-modes objective function by combining the notions of density and cluster assignment. The algorithm becomes K-means and K-medoids in the limit of very large and very small scales. Computationally, it is slightly slower than K-means but much faster than mean-shift or K-medoids. Unlike K-means, it is able to find centroids that are valid patterns, truly representative of a cluster, even with nonconvex clusters, and appears robust to outliers and misspecification of the scale and number of clusters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes