LGAIOct 4, 2021

Git: Clustering Based on Graph of Intensity Topology

arXiv:2110.01274v17 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the problem of achieving comprehensive clustering performance for data scientists, though it appears incremental as it builds on existing non-convex clustering methods.

The paper tackles the challenge of developing a clustering algorithm that simultaneously meets accuracy, robustness, interpretability, speed, and ease-of-use (ARISE) requirements, proposing GIT which combines local intensity peaks and global topological graphs, resulting in outperforming other non-convex methods by about 10% in F1-score on datasets like MNIST and FashionMNIST.

\textbf{A}ccuracy, \textbf{R}obustness to noises and scales, \textbf{I}nterpretability, \textbf{S}peed, and \textbf{E}asy to use (ARISE) are crucial requirements of a good clustering algorithm. However, achieving these goals simultaneously is challenging, and most advanced approaches only focus on parts of them. Towards an overall consideration of these aspects, we propose a novel clustering algorithm, namely GIT (Clustering Based on \textbf{G}raph of \textbf{I}ntensity \textbf{T}opology). GIT considers both local and global data structures: firstly forming local clusters based on intensity peaks of samples, and then estimating the global topological graph (topo-graph) between these local clusters. We use the Wasserstein Distance between the predicted and prior class proportions to automatically cut noisy edges in the topo-graph and merge connected local clusters as final clusters. Then, we compare GIT with seven competing algorithms on five synthetic datasets and nine real-world datasets. With fast local cluster detection, robust topo-graph construction and accurate edge-cutting, GIT shows attractive ARISE performance and significantly exceeds other non-convex clustering methods. For example, GIT outperforms its counterparts about $10\%$ (F1-score) on MNIST and FashionMNIST. Code is available at \color{red}{https://github.com/gaozhangyang/GIT}.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes