Reliable Distributed Clustering with Redundant Data Assignment
This addresses the challenge of scalable and reliable distributed data processing for large-scale machine learning applications, though it appears incremental as it builds on existing distributed clustering methods.
The paper tackles the problem of distributed clustering with unreliable machines by proposing a novel data assignment scheme that ensures global information is available even when some machines fail, leading to algorithms with good approximation guarantees for clustering and dimensionality reduction problems.
In this paper, we present distributed generalized clustering algorithms that can handle large scale data across multiple machines in spite of straggling or unreliable machines. We propose a novel data assignment scheme that enables us to obtain global information about the entire data even when some machines fail to respond with the results of the assigned local computations. The assignment scheme leads to distributed algorithms with good approximation guarantees for a variety of clustering and dimensionality reduction problems.