Training DNNs in O(1) memory with MEM-DFA using Random Matrices
This work addresses the significant memory bottleneck for training very deep neural networks, which is a problem for researchers and practitioners working with large models or limited hardware.
This paper introduces MEM-DFA, a method that reduces the memory consumption for training deep neural networks to O(1) complexity, regardless of the number of layers. It achieves this by leveraging the layer independence in direct feedback alignment (DFA) and avoiding the simultaneous storage of all activation vectors, unlike standard backpropagation (BP), FA, and DFA.
This work presents a method for reducing memory consumption to a constant complexity when training deep neural networks. The algorithm is based on the more biologically plausible alternatives of the backpropagation (BP): direct feedback alignment (DFA) and feedback alignment (FA), which use random matrices to propagate error. The proposed method, memory-efficient direct feedback alignment (MEM-DFA), uses higher independence of layers in DFA and allows avoiding storing at once all activation vectors, unlike standard BP, FA, and DFA. Thus, our algorithm's memory usage is constant regardless of the number of layers in a neural network. The method increases the computational cost only by a constant factor of one extra forward pass. The MEM-DFA, BP, FA, and DFA were evaluated along with their memory profiles on MNIST and CIFAR-10 datasets on various neural network models. Our experiments agree with our theoretical results and show a significant decrease in the memory cost of MEM-DFA compared to the other algorithms.