Memory-Efficient Backpropagation Through Time
This addresses a practical memory bottleneck for researchers and practitioners training RNNs on long sequences, offering a significant improvement over standard methods.
The paper tackles the problem of high memory consumption in backpropagation through time (BPTT) for training recurrent neural networks (RNNs) by proposing a dynamic programming approach that balances caching and recomputation to fit within user-set memory budgets, achieving a 95% memory reduction for sequences of length 1000 with only one-third more time per iteration.
We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs). Our approach uses dynamic programming to balance a trade-off between caching of intermediate results and recomputation. The algorithm is capable of tightly fitting within almost any user-set memory budget while finding an optimal execution policy minimizing the computational cost. Computational devices have limited memory capacity and maximizing a computational performance given a fixed memory budget is a practical use-case. We provide asymptotic computational upper bounds for various regimes. The algorithm is particularly effective for long sequences. For sequences of length 1000, our algorithm saves 95\% of memory usage while using only one third more time per iteration than the standard BPTT.