$\rm SP^3$: Enhancing Structured Pruning via PCA Projection
This work addresses the need for more efficient and smaller language models for deployment in resource-constrained environments, representing a novel method rather than an incremental improvement.
The paper tackles the problem of compressing the hidden dimension in pre-trained language models through structured pruning, achieving a 70% reduction in hidden dimension, compressing 94% of BERTbase while maintaining over 96% accuracy and outperforming other methods by 6% in accuracy at the same compression ratio.
Structured pruning is a widely used technique for reducing the size of pre-trained language models (PLMs), but current methods often overlook the potential of compressing the hidden dimension (d) in PLMs, a dimension critical to model size and efficiency. This paper introduces a novel structured pruning approach, Structured Pruning with PCA Projection (SP3), targeting the effective reduction of d by projecting features into a space defined by principal components before masking. Extensive experiments on benchmarks (GLUE and SQuAD) show that SP3 can reduce d by 70%, compress 94% of the BERTbase model, maintain over 96% accuracy, and outperform other methods that compress d by 6% in accuracy at the same compression ratio. SP3 has also proven effective with other models, including OPT and Llama. Our data and code are available at an anonymous repo.