LGAIDCMar 13

Federated Hierarchical Clustering with Automatic Selection of Optimal Cluster Numbers

arXiv:2603.1268451.04 citations
AI Analysis

This addresses a key limitation in federated clustering for privacy-protected distributed data, though it is an incremental improvement over existing methods.

The paper tackles the problem of unknown and imbalanced cluster numbers in federated clustering by proposing Fed-$k^*$-HC, a framework that automatically determines the optimal number of clusters using hierarchical clustering, achieving accurate results in experiments on diverse datasets.

Federated Clustering (FC) is an emerging and promising solution in exploring data distribution patterns from distributed and privacy-protected data in an unsupervised manner. Existing FC methods implicitly rely on the assumption that clients are with a known number of uniformly sized clusters. However, the true number of clusters is typically unknown, and cluster sizes are naturally imbalanced in real scenarios. Furthermore, the privacy-preserving transmission constraints in federated learning inevitably reduce usable information, making the development of robust and accurate FC extremely challenging. Accordingly, we propose a novel FC framework named Fed-$k^*$-HC, which can automatically determine an optimal number of clusters $k^*$ based on the data distribution explored through hierarchical clustering. To obtain the global data distribution for $k^*$ determination, we let each client generate micro-subclusters. Their prototypes are then uploaded to the server for hierarchical merging. The density-based merging design allows exploring clusters of varying sizes and shapes, and the progressive merging process can self-terminate according to the neighboring relationships among the prototypes to determine $k^*$. Extensive experiments on diverse datasets demonstrate the FC capability of the proposed Fed-$k^*$-HC in accurately exploring a proper number of clusters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes