LGDBNov 27, 2023

FLASC: A Flare-Sensitive Clustering Algorithm

arXiv:2311.15887v25 citationsh-index: 19
Originality Synthesis-oriented
AI Analysis

This work addresses the need for more detailed cluster analysis in exploratory data analysis, offering an incremental improvement for researchers and practitioners in data science.

The paper tackles the problem of identifying meaningful subpopulations within clusters by detecting branches, building on HDBSCAN* to create FLASC, an efficient flare-sensitive clustering algorithm that scales similarly in computational cost and provides stable outputs on synthetic and real-world datasets.

Clustering algorithms are often used to find subpopulations in exploratory data analysis workflows. Not only the clusters themselves, but also their shape can represent meaningful subpopulations. In this paper, we present FLASC, an algorithm that detects branches within clusters to identify such subpopulations. FLASC builds upon HDBSCAN*, a state-of-the-art density-based clustering algorithm, and detects branches in a post-processing step that describes within-cluster connectivity. Two variants of the algorithm are presented, which trade computational cost for noise robustness. We show that both variants scale similarly to HDBSCAN* in terms of computational cost and provide stable outputs using synthetic data sets, resulting in an efficient flare-sensitive clustering algorithm. In addition, we demonstrate the benefit of branch-detection on two real-world data sets.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes