CVApr 12, 2022

Compact Model Training by Low-Rank Projection with Energy Transfer

arXiv:2204.05566v310 citationsh-index: 37
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient model compression for deep learning applications, offering a novel approach to low-rank training that could benefit resource-constrained deployments, though it is incremental in improving upon existing low-rank methods.

The paper tackles the problem of training low-rank compressed neural networks from scratch, which typically suffer from performance degradation due to poor initialization from pre-trained models, and proposes a method called low-rank projection with energy transfer (LRPET) that achieves competitive performance by alternating training and projection with energy compensation.

Low-rankness plays an important role in traditional machine learning, but is not so popular in deep learning. Most previous low-rank network compression methods compress networks by approximating pre-trained models and re-training. However, the optimal solution in the Euclidean space may be quite different from the one with low-rank constraint. A well-pre-trained model is not a good initialization for the model with low-rank constraints. Thus, the performance of a low-rank compressed network degrades significantly. Compared with other network compression methods such as pruning, low-rank methods attract less attention in recent years. In this paper, we devise a new training method, low-rank projection with energy transfer (LRPET), that trains low-rank compressed networks from scratch and achieves competitive performance. We propose to alternately perform stochastic gradient descent training and projection of each weight matrix onto the corresponding low-rank manifold. Compared to re-training on the compact model, this enables full utilization of model capacity since solution space is relaxed back to Euclidean space after projection. The matrix energy (the sum of squares of singular values) reduction caused by projection is compensated by energy transfer. We uniformly transfer the energy of the pruned singular values to the remaining ones. We theoretically show that energy transfer eases the trend of gradient vanishing caused by projection. In modern networks, a batch normalization (BN) layer can be merged into the previous convolution layer for inference, thereby influencing the optimal low-rank approximation of the previous layer. We propose BN rectification to cut off its effect on the optimal low-rank approximation, which further improves the performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes