Till the Layers Collapse: Compressing a Deep Neural Network through the Lenses of Batch Normalization Layers
This addresses the issue of high computational resource consumption for users of deep neural networks, but it is incremental as it builds on existing compression techniques.
The paper tackles the problem of overparameterization in deep neural networks by introducing TLC, a method that compresses networks by reducing depth through batch normalization layers, achieving reduced computational requirements and latency across models like Swin-T, MobileNet-V2, and RoBERTa in image classification and NLP tasks.
Today, deep neural networks are widely used since they can handle a variety of complex tasks. Their generality makes them very powerful tools in modern technology. However, deep neural networks are often overparameterized. The usage of these large models consumes a lot of computation resources. In this paper, we introduce a method called \textbf{T}ill the \textbf{L}ayers \textbf{C}ollapse (TLC), which compresses deep neural networks through the lenses of batch normalization layers. By reducing the depth of these networks, our method decreases deep neural networks' computational requirements and overall latency. We validate our method on popular models such as Swin-T, MobileNet-V2, and RoBERTa, across both image classification and natural language processing (NLP) tasks.