NIDCLGApr 29, 2020

Caramel: Accelerating Decentralized Distributed Deep Learning with Computation Scheduling

arXiv:2004.14020v1
AI Analysis

This addresses a bottleneck in distributed DNN training for practitioners, offering a hardware-independent solution without requiring user changes.

The paper tackles performance degradation in decentralized distributed deep learning (AllReduce) by developing Caramel, a system that uses model-aware computation scheduling and communication optimizations to accelerate training, achieving up to 3.62x improvement in iteration time.

The method of choice for parameter aggregation in Deep Neural Network (DNN) training, a network-intensive task, is shifting from the Parameter Server model to decentralized aggregation schemes (AllReduce) inspired by theoretical guarantees of better performance. However, current implementations of AllReduce overlook the interdependence of communication and computation, resulting in significant performance degradation. In this paper, we develop Caramel, a system that accelerates decentralized distributed deep learning through model-aware computation scheduling and communication optimizations for AllReduce. Caramel achieves this goal through (a) computation DAG scheduling that expands the feasible window of transfer for each parameter (transfer boundaries), and (b) network optimizations for smoothening of the load including adaptive batching and pipelining of parameter transfers. Caramel maintains the correctness of the dataflow model, is hardware-independent, and does not require any user-level or framework-level changes. We implement Caramel over TensorFlow and show that the iteration time of DNN training can be improved by up to 3.62x in a cloud environment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes