LGPFOct 19, 2022

Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction

arXiv:2210.10246v28 citationsh-index: 36
Originality Incremental advance
AI Analysis

This addresses the problem of limited accelerator memory for researchers and practitioners training large Transformer models, offering incremental improvements over existing methods.

The paper tackles the memory bottleneck in training Transformer-based models by proposing Tempo, a method that replaces key layers to reduce memory usage, enabling up to 2x higher batch sizes and 16-26% faster training throughput on models like BERT Large, GPT2, and RoBERTa.

Training deep learning models can be computationally expensive. Prior works have shown that increasing the batch size can potentially lead to better overall throughput. However, the batch size is frequently limited by the accelerator memory capacity due to the activations/feature maps stored for the training backward pass, as larger batch sizes require larger feature maps to be stored. Transformer-based models, which have recently seen a surge in popularity due to their good performance and applicability to a variety of tasks, have a similar problem. To remedy this issue, we propose Tempo, a new approach to efficiently use accelerator (e.g., GPU) memory resources for training Transformer-based models. Our approach provides drop-in replacements for the GELU, LayerNorm, and Attention layers, reducing the memory usage and ultimately leading to more efficient training. We implement Tempo and evaluate the throughput, memory usage, and accuracy/loss on the BERT Large pre-training task. We demonstrate that Tempo enables up to 2x higher batch sizes and 16% higher training throughput over the state-of-the-art baseline. We also evaluate Tempo on GPT2 and RoBERTa models, showing 19% and 26% speedup over the baseline.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes