CLOct 5, 2020

Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior

arXiv:2010.01791v11000 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient model compression in NLP by providing a structured pruning technique that enhances performance while reducing model size, though it is incremental as it builds on existing pruning methods.

The paper tackles the problem of pruning Transformer models by introducing a structured pruning method called spectral-normalized identity priors (SNIP), which penalizes residual modules toward identity mappings to identify and discard unimportant non-linear mappings, resulting in improved performance over state-of-the-art by 0.5 to 1.0% on average at 50% compression ratio on BERT across 5 GLUE tasks.

Traditional (unstructured) pruning methods for a Transformer model focus on regularizing the individual weights by penalizing them toward zero. In this work, we explore spectral-normalized identity priors (SNIP), a structured pruning approach that penalizes an entire residual module in a Transformer model toward an identity mapping. Our method identifies and discards unimportant non-linear mappings in the residual connections by applying a thresholding operator on the function norm. It is applicable to any structured module, including a single attention head, an entire attention block, or a feed-forward subnetwork. Furthermore, we introduce spectral normalization to stabilize the distribution of the post-activation values of the Transformer layers, further improving the pruning effectiveness of the proposed methodology. We conduct experiments with BERT on 5 GLUE benchmark tasks to demonstrate that SNIP achieves effective pruning results while maintaining comparable performance. Specifically, we improve the performance over the state-of-the-art by 0.5 to 1.0% on average at 50% compression ratio.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes