DSAILGJan 25, 2017

Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D

arXiv:1701.07204v469 citations
Originality Synthesis-oriented
AI Analysis

This work addresses efficient exact clustering algorithms for 1D data, which is incremental as it builds on and generalizes prior overlooked methods.

The paper tackles the problem of exact k-Means, k-Medians, and Bregman Divergence clustering in 1D, presenting overlooked existing work, reducing space usage, generalizing to data structures for any k, and extending to absolute distance and Bregman Divergence, with experiments comparing practical performance.

The $k$-Means clustering problem on $n$ points is NP-Hard for any dimension $d\ge 2$, however, for the 1D case there exists exact polynomial time algorithms. Previous literature reported an $O(kn^2)$ time dynamic programming algorithm that uses $O(kn)$ space. It turns out that the problem has been considered under a different name more than twenty years ago. We present all the existing work that had been overlooked and compare the various solutions theoretically. Moreover, we show how to reduce the space usage for some of them, as well as generalize them to data structures that can quickly report an optimal $k$-Means clustering for any $k$. Finally we also generalize all the algorithms to work for the absolute distance and to work for any Bregman Divergence. We complement our theoretical contributions by experiments that compare the practical performance of the various algorithms.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes