DCLGJul 24, 2018

Supporting Very Large Models using Automatic Dataflow Graph Partitioning

arXiv:1807.08887v2172 citations
Originality Incremental advance
AI Analysis

This work addresses memory and efficiency challenges for researchers and practitioners training large-scale models, representing an incremental improvement in distributed training systems.

The paper tackles the problem of training very large deep neural network models by partitioning them across multiple GPUs to reduce memory footprint, achieving 25% to 400% speedup over alternative approaches in experiments on an 8-GPU machine.

This paper presents Tofu, a system that partitions very large DNN models across multiple GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow graph of fine-grained tensor operators in order to work transparently with a general-purpose deep learning platform like MXNet. In order to automatically partition each operator, we propose to describe the semantics of an operator in a simple language which represents tensors as lambda functions mapping from tensor coordinates to values. To optimally partition different operators in a dataflow graph, Tofu uses a recursive search algorithm that minimizes the total communication cost. Our experiments on an 8-GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves 25% - 400% speedup over alternative approaches to train very large models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes