LGCVIRSep 28, 2025

GBSK: Skeleton Clustering via Granular-ball Computing and Multi-Sampling for Large-Scale Data

arXiv:2509.23742v1h-index: 16Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient clustering for large-scale data, which is incremental as it builds on existing granular-ball techniques.

The authors tackled the problem of clustering large-scale datasets by proposing GBSK, a scalable skeleton clustering algorithm that uses granular-ball computing and multi-sampling to reduce computational overhead while maintaining high accuracy, achieving strong performance on datasets with up to 100 million instances.

To effectively handle clustering task for large-scale datasets, we propose a novel scalable skeleton clustering algorithm, namely GBSK, which leverages the granular-ball technique to capture the underlying structure of data. By multi-sampling the dataset and constructing multi-grained granular-balls, GBSK progressively uncovers a statistical "skeleton" -- a spatial abstraction that approximates the essential structure and distribution of the original data. This strategy enables GBSK to dramatically reduce computational overhead while maintaining high clustering accuracy. In addition, we introduce an adaptive version, AGBSK, with simplified parameter settings to enhance usability and facilitate deployment in real-world scenarios. Extensive experiments conducted on standard computing hardware demonstrate that GBSK achieves high efficiency and strong clustering performance on large-scale datasets, including one with up to 100 million instances across 256 dimensions. Our implementation and experimental results are available at: https://github.com/XFastDataLab/GBSK/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes