LGAINov 29, 2023

LayerCollapse: Adaptive compression of neural networks

arXiv:2311.17943v32 citationsh-index: 68
Originality Incremental advance
AI Analysis

This addresses the problem of high computational resource demands and overfitting in deep learning models for practitioners, though it is incremental as it builds on existing pruning techniques.

The paper tackles the challenge of compressing large, overparameterized Transformer models by introducing LayerCollapse, a structured pruning method that reduces depth in fully connected layers with minimal performance impact, achieving compression without fine-tuning in benchmarks like sentiment analysis, text generation, and image classification.

Handling the ever-increasing scale of contemporary deep learning and transformer-based models poses a significant challenge. Overparameterized Transformer networks outperform prior art in Natural Language processing and Computer Vision. These models contain hundreds of millions of parameters, demanding significant computational resources and making them prone to overfitting on down stream tasks. In this work we present LayerCollapse, a novel structured pruning method to reduce the depth of fully connected layers. We propose an innovative regularizer that promotes shallow fully connected layers, compressing the model with minimal performance impact. This regularizer enables post-training compression without fine-tuning while preserving performance. LayerCollapse controls model expressiveness by regularizing the activation functions between fully connected layers, modulating them to linearity. A linear activation function collapses the rank of a transformation to the rank of the corresponding linear transformation, which demands less resources from the hardware. We demonstrate the effectiveness of LayerCollapse by showing its compression capabilities in sentimental analysis, text generation, and image classification benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes