DC LGDec 27, 2016

ASAP: Asynchronous Approximate Data-Parallel Computation

arXiv:1612.08608v14.33 citations

Originality Highly original

AI Analysis

This addresses performance bottlenecks in distributed systems for approximate workloads like graph processing and machine learning, offering a novel approach to reduce synchronization costs while maintaining accuracy.

The paper tackles the high synchronization overheads in distributed data-parallel computation by introducing ASAP, a model with asynchronous and approximate processing semantics, which achieves 2-10X speedups in convergence and up to 10X savings in network costs for distributed machine learning applications.

Emerging workloads, such as graph processing and machine learning are approximate because of the scale of data involved and the stochastic nature of the underlying algorithms. These algorithms are often distributed over multiple machines using bulk-synchronous processing (BSP) or other synchronous processing paradigms such as map-reduce. However, data parallel processing primitives such as repeated barrier and reduce operations introduce high synchronization overheads. Hence, many existing data-processing platforms use asynchrony and staleness to improve data-parallel job performance. Often, these systems simply change the synchronous communication to asynchronous between the worker nodes in the cluster. This improves the throughput of data processing but results in poor accuracy of the final output since different workers may progress at different speeds and process inconsistent intermediate outputs. In this paper, we present ASAP, a model that provides asynchronous and approximate processing semantics for data-parallel computation. ASAP provides fine-grained worker synchronization using NOTIFY-ACK semantics that allows independent workers to run asynchronously. ASAP also provides stochastic reduce that provides approximate but guaranteed convergence to the same result as an aggregated all-reduce. In our results, we show that ASAP can reduce synchronization costs and provides 2-10X speedups in convergence and up to 10X savings in network costs for distributed machine learning applications and provides strong convergence guarantees.

View on arXiv PDF

Similar