LGDSCOMLFeb 3, 2022

Fast and explainable clustering based on sorting

arXiv:2202.01456v217 citations
AI Analysis

This provides a fast and interpretable clustering solution for data analysis, but it appears incremental as it builds on existing clustering paradigms.

The authors tackled the problem of clustering by introducing CLASSIX, a fast and explainable method that uses sorting and aggregation, and demonstrated that it competes with state-of-the-art algorithms while achieving near linear time complexity.

We introduce a fast and explainable clustering method called CLASSIX. It consists of two phases, namely a greedy aggregation phase of the sorted data into groups of nearby data points, followed by the merging of groups into clusters. The algorithm is controlled by two scalar parameters, namely a distance parameter for the aggregation and another parameter controlling the minimal cluster size. Extensive experiments are conducted to give a comprehensive evaluation of the clustering performance on synthetic and real-world datasets, with various cluster shapes and low to high feature dimensionality. Our experiments demonstrate that CLASSIX competes with state-of-the-art clustering algorithms. The algorithm has linear space complexity and achieves near linear time complexity on a wide range of problems. Its inherent simplicity allows for the generation of intuitive explanations of the computed clusters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes