DCLGJul 25, 2020

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism

arXiv:2007.12856v142 citations
Originality Incremental advance
AI Analysis

This addresses memory and scalability challenges in deep learning for scientific workflows, enabling more accurate models with large, high-dimensional data.

The paper tackles the problem of training large-scale 3D convolutional neural networks, which is costly and memory-intensive, by developing scalable hybrid-parallel algorithms that combine data and spatial parallelism. The result is good weak and strong scaling on up to 2K GPUs, enabling training with larger samples and achieving an order-of-magnitude improvement in prediction accuracy for CosmoFlow.

We present scalable hybrid-parallel algorithms for training large-scale 3D convolutional neural networks. Deep learning-based emerging scientific workflows often require model training with large, high-dimensional samples, which can make training much more costly and even infeasible due to excessive memory usage. We solve these challenges by extensively applying hybrid parallelism throughout the end-to-end training pipeline, including both computations and I/O. Our hybrid-parallel algorithm extends the standard data parallelism with spatial parallelism, which partitions a single sample in the spatial domain, realizing strong scaling beyond the mini-batch dimension with a larger aggregated memory capacity. We evaluate our proposed training algorithms with two challenging 3D CNNs, CosmoFlow and 3D U-Net. Our comprehensive performance studies show that good weak and strong scaling can be achieved for both networks using up 2K GPUs. More importantly, we enable training of CosmoFlow with much larger samples than previously possible, realizing an order-of-magnitude improvement in prediction accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes