CL AIApr 5, 2023

To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency

arXiv:2304.02721v326.2222 citationsh-index: 24

Originality Incremental advance

AI Analysis

This addresses deployment challenges in latency-sensitive or web-scale applications for summarization models, though it is incremental as it builds on existing pruning techniques.

The paper tackled the problem of improving inference efficiency in sequence-to-sequence models for summarization by studying structured pruning, showing that asymmetric pruning can achieve nearly 3x faster inference latency with only about a 1-point loss in Rouge-2 score.

Sequence-to-sequence language models can be used to produce abstractive summaries which are coherent, relevant, and concise. Still, model sizes can make deployment in latency-sensitive or web-scale implementations difficult. This paper studies the relationship between model size, structured pruning, inference efficiency, and summarization accuracy on widely used summarization datasets. We show that model accuracy is tied to the encoder size while inference efficiency is connected to the decoder. Using asymmetric pruning can lead to nearly 3x improvement in inference latency with ~1 point loss in Rouge-2. Moreover, we find both the average degradation and the role of asymmetry to be consistent across model sizes and variations in datasets.

View on arXiv PDF

Similar