Straggler-Agnostic and Communication-Efficient Distributed Primal-Dual Algorithm for High-Dimensional Data Mining
This addresses communication inefficiencies in distributed systems, particularly for high-dimensional data and straggler issues, though it appears incremental as it builds on prior methods.
The paper tackles the problem of high communication costs in distributed data mining by proposing a primal-dual algorithm that reduces both communication rounds and time per round, achieving linear convergence for convex problems and demonstrating faster performance in experiments.
Recently, reducing communication time between machines becomes the main focus of distributed data mining. Previous methods propose to make workers do more computation locally before aggregating local solutions in the server such that fewer communication rounds between server and workers are required. However, these methods do not consider reducing the communication time per round and work very poor under certain conditions, for example, when there are straggler problems or the dataset is of high dimension. In this paper, we target to reduce communication time per round as well as the required communication rounds. We propose a communication-efficient distributed primal-dual method with straggler-agnostic server and bandwidth-efficient workers. We analyze the convergence property and prove that the proposed method guarantees linear convergence rate to the optimal solution for convex problems. Finally, we conduct large-scale experiments in simulated and real distributed systems and experimental results demonstrate that the proposed method is much faster than compared methods.