MUSCO: Multi-Stage Compression of neural networks
This work addresses the problem of reducing model size and computational cost for deep learning practitioners, but it is incremental as it builds on existing low-rank tensor approximation techniques.
The paper tackles neural network compression by proposing MUSCO, a multi-stage iterative method that alternates low-rank factorization with smart rank selection and fine-tuning, improving compression rates while maintaining accuracy across various tasks.
The low-rank tensor approximation is very promising for the compression of deep neural networks. We propose a new simple and efficient iterative approach, which alternates low-rank factorization with a smart rank selection and fine-tuning. We demonstrate the efficiency of our method comparing to non-iterative ones. Our approach improves the compression rate while maintaining the accuracy for a variety of tasks.