LGMLSep 26, 2022

Clustering by Direct Optimization of the Medoid Silhouette

arXiv:2209.12553v112 citationsh-index: 31
Originality Synthesis-oriented
AI Analysis

This work provides incremental improvements for researchers and practitioners in clustering by speeding up optimization of a popular validation measure.

The paper tackles the challenge of efficiently optimizing the medoid-based Silhouette clustering quality measure by developing fast algorithms that combine ideas from Silhouette and PAM, achieving a 10464× speedup on real data with 30000 samples and k=100.

The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate clustering results. A very popular measure is the Silhouette. We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, and provide two fast versions for the direct optimization. We combine ideas from the original Silhouette with the well-known PAM algorithm and its latest improvements FasterPAM. One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$. In experiments on real data with 30000 samples and $k$=100, we observed a 10464$\times$ speedup compared to the original PAMMEDSIL algorithm.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes