LGOCMLJan 1, 2019

Clustering with Distributed Data

arXiv:1901.00214v18 citations
Originality Incremental advance
AI Analysis

This addresses clustering for distributed systems such as IoT and sensor networks, but it is incremental as it builds on existing K-means methods.

The paper tackles K-means clustering in distributed networked environments like IoT, where data is spread across nodes with limited processing, by proposing a distributed algorithm called NK-means that uses local information exchange and converges to solutions close to Lloyd's minima. The result shows that NK-means can approximate Lloyd's minima arbitrarily accurately through parameter tuning.

We consider $K$-means clustering in networked environments (e.g., internet of things (IoT) and sensor networks) where data is inherently distributed across nodes and processing power at each node may be limited. We consider a clustering algorithm referred to as networked $K$-means, or $NK$-means, which relies only on local neighborhood information exchange. Information exchange is limited to low-dimensional statistics and not raw data at the agents. The proposed approach develops a parametric family of multi-agent clustering objectives (parameterized by $ρ$) and associated distributed $NK$-means algorithms (also parameterized by $ρ$). The $NK$-means algorithm with parameter $ρ$ converges to a set of fixed points relative to the associated multi-agent objective (designated as `generalized minima'). By appropriate choice of $ρ$, the set of generalized minima may be brought arbitrarily close to the set of Lloyd's minima. Thus, the $NK$-means algorithm may be used to compute Lloyd's minima of the collective dataset up to arbitrary accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes