Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch
This addresses memory constraints for researchers and practitioners training large models like Transformers and ResNets, though it is incremental as it builds on existing re-materialization techniques.
The authors tackled the problem of high memory consumption during training of PyTorch deep neural network models by developing Rockmate, an automatic tool that reduces activation memory usage by a factor of 2 to 5 with an overhead of 10% to 20%.
We propose Rockmate to control the memory requirements when training PyTorch DNN models. Rockmate is an automatic tool that starts from the model code and generates an equivalent model, using a predefined amount of memory for activations, at the cost of a few re-computations. Rockmate automatically detects the structure of computational and data dependencies and rewrites the initial model as a sequence of complex blocks. We show that such a structure is widespread and can be found in many models in the literature (Transformer based models, ResNet, RegNets,...). This structure allows us to solve the problem in a fast and efficient way, using an adaptation of Checkmate (too slow on the whole model but general) at the level of individual blocks and an adaptation of Rotor (fast but limited to sequential models) at the level of the sequence itself. We show through experiments on many models that Rockmate is as fast as Rotor and as efficient as Checkmate, and that it allows in many cases to obtain a significantly lower memory consumption for activations (by a factor of 2 to 5) for a rather negligible overhead (of the order of 10% to 20%). Rockmate is open source and available at https://github.com/topal-team/rockmate.