NELGJun 10, 2016

Memory-Efficient Backpropagation Through Time

arXiv:1606.03401v1269 citations
AI Analysis

This addresses a practical memory bottleneck for researchers and practitioners training RNNs on long sequences, offering a significant improvement over standard methods.

The paper tackles the problem of high memory consumption in backpropagation through time (BPTT) for training recurrent neural networks (RNNs) by proposing a dynamic programming approach that balances caching and recomputation to fit within user-set memory budgets, achieving a 95% memory reduction for sequences of length 1000 with only one-third more time per iteration.

We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs). Our approach uses dynamic programming to balance a trade-off between caching of intermediate results and recomputation. The algorithm is capable of tightly fitting within almost any user-set memory budget while finding an optimal execution policy minimizing the computational cost. Computational devices have limited memory capacity and maximizing a computational performance given a fixed memory budget is a practical use-case. We provide asymptotic computational upper bounds for various regimes. The algorithm is particularly effective for long sequences. For sequences of length 1000, our algorithm saves 95\% of memory usage while using only one third more time per iteration than the standard BPTT.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes