Exploiting Elasticity in Tensor Ranks for Compressing Neural Networks
This work addresses model compression for deep neural networks, offering an incremental improvement by introducing a new elasticity dimension in tensor ranks.
The paper tackles neural network compression by exploiting elasticity in tensor ranks, proposing a nuclear-norm rank minimization factorization (NRMF) approach that dynamically searches for reduced ranks during training, achieving a graceful tradeoff between model size and accuracy and showing superiority over previous methods like VBMF.
Elasticities in depth, width, kernel size and resolution have been explored in compressing deep neural networks (DNNs). Recognizing that the kernels in a convolutional neural network (CNN) are 4-way tensors, we further exploit a new elasticity dimension along the input-output channels. Specifically, a novel nuclear-norm rank minimization factorization (NRMF) approach is proposed to dynamically and globally search for the reduced tensor ranks during training. Correlation between tensor ranks across multiple layers is revealed, and a graceful tradeoff between model size and accuracy is obtained. Experiments then show the superiority of NRMF over the previous non-elastic variational Bayesian matrix factorization (VBMF) scheme.