Speeding up Resnet Architecture with Layers Targeted Low Rank Decomposition
This work addresses the problem of efficient neural network deployment for practitioners by providing a hardware-aware compression method, though it is incremental as it builds on existing low-rank decomposition techniques.
The paper tackled speeding up ResNet50 by compressing layers with low-rank decomposition tailored to hardware, achieving a 5.36% training speedup and 15.79% inference speedup on specific hardware with only a 1% accuracy drop on ImageNet.
Compression of a neural network can help in speeding up both the training and the inference of the network. In this research, we study applying compression using low rank decomposition on network layers. Our research demonstrates that to acquire a speed up, the compression methodology should be aware of the underlying hardware as analysis should be done to choose which layers to compress. The advantage of our approach is demonstrated via a case study of compressing ResNet50 and training on full ImageNet-ILSVRC2012. We tested on two different hardware systems Nvidia V100 and Huawei Ascend910. With hardware targeted compression, results on Ascend910 showed 5.36% training speedup and 15.79% inference speed on Ascend310 with only 1% drop in accuracy compared to the original uncompressed model