CVFeb 28, 2022

DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

Joya Chen, Kai Xu, Yuhui Wang, Yifei Cheng, Angela Yao

arXiv:2202.13808v37.39 citationsHas Code

Originality Highly original

AI Analysis

This addresses memory constraints for researchers and practitioners training large models, offering a novel approach to memory-efficient training with demonstrated performance gains.

The paper tackles the GPU memory bottleneck in deep neural network training by proposing DropIT, a method that drops min-k elements of intermediate tensors to reduce memory footprint, achieving up to 90% sparsity while improving testing accuracy on tasks like classification and object detection.

A standard hardware bottleneck when training deep neural networks is GPU memory. The bulk of memory is occupied by caching intermediate tensors for gradient computation in the backward pass. We propose a novel method to reduce this footprint - Dropping Intermediate Tensors (DropIT). DropIT drops min-k elements of the intermediate tensors and approximates gradients from the sparsified tensors in the backward pass. Theoretically, DropIT reduces noise on estimated gradients and therefore has a higher rate of convergence than vanilla-SGD. Experiments show that we can drop up to 90\% of the intermediate tensor elements in fully-connected and convolutional layers while achieving higher testing accuracy for Visual Transformers and Convolutional Neural Networks on various tasks (e.g., classification, object detection, instance segmentation). Our code and models are available at https://github.com/chenjoya/dropit.

View on arXiv PDF Code

Similar