LGAug 26, 2024

Provable Imbalanced Point Clustering

David Denisov, Dan Feldman, Shlomi Dolev, Michael Segal

arXiv:2408.14225v22.6h-index: 45

Originality Incremental advance

AI Analysis

This work addresses clustering challenges in imbalanced datasets for researchers and practitioners, but it appears incremental as it builds on existing coreset and clustering techniques.

The paper tackles the problem of imbalanced point clustering by proposing efficient and provable methods using coresets to approximate k-centers fitting, and introduces choice clustering that combines algorithms for improved performance, with experiments on real and synthetic data showing empirical contributions.

We suggest efficient and provable methods to compute an approximation for imbalanced point clustering, that is, fitting $k$-centers to a set of points in $\mathbb{R}^d$, for any $d,k\geq 1$. To this end, we utilize \emph{coresets}, which, in the context of the paper, are essentially weighted sets of points in $\mathbb{R}^d$ that approximate the fitting loss for every model in a given set, up to a multiplicative factor of $1\pm\varepsilon$. We provide [Section 3 and Section E in the appendix] experiments that show the empirical contribution of our suggested methods for real images (novel and reference), synthetic data, and real-world data. We also propose choice clustering, which by combining clustering algorithms yields better performance than each one separately.

View on arXiv PDF

Similar