CL AI LGOct 16, 2023

NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models

Jongwoo Ko, Seungjoon Park, Yujin Kim, Sumyeong Ahn, Du-Seong Chang, Euijai Ahn, Se-Young Yun

arXiv:2310.10054v121.5134 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This addresses the need for faster inference in encoder-decoder models for NLP tasks, but it is incremental as it builds on existing structured pruning methods.

The paper tackles the problem of accelerating encoder-decoder language models through structured pruning, proposing the NASH framework that narrows the encoder and shortens the decoder, which is validated to improve inference speed and maintain output quality in experiments.

Structured pruning methods have proven effective in reducing the model size and accelerating inference speed in various network architectures such as Transformers. Despite the versatility of encoder-decoder models in numerous NLP tasks, the structured pruning methods on such models are relatively less explored compared to encoder-only models. In this study, we investigate the behavior of the structured pruning of the encoder-decoder models in the decoupled pruning perspective of the encoder and decoder component, respectively. Our findings highlight two insights: (1) the number of decoder layers is the dominant factor of inference speed, and (2) low sparsity in the pruned encoder network enhances generation quality. Motivated by these findings, we propose a simple and effective framework, NASH, that narrows the encoder and shortens the decoder networks of encoder-decoder models. Extensive experiments on diverse generation and inference tasks validate the effectiveness of our method in both speedup and output quality.

View on arXiv PDF Code

Similar