Task-specific Compression for Multi-task Language Models using Attribution-based Pruning
This work addresses efficiency issues for users of multi-task language models by enabling task-specific compression without retraining, though it is incremental as it builds on existing pruning techniques.
The paper tackles the problem of multi-task language models using unnecessarily large parameters for specific tasks by proposing a training-free compression method using attribution-based pruning, which significantly outperforms baseline methods on six datasets and preserves performance in unseen domains.
Multi-task language models show outstanding performance for various natural language understanding tasks with only a single model. However, these language models utilize an unnecessarily large number of model parameters, even when used only for a specific task. This paper proposes a novel training-free compression method for multi-task language models using a pruning method. Specifically, we use an attribution method to determine which neurons are essential for performing a specific task. We task-specifically prune unimportant neurons and leave only task-specific parameters. Furthermore, we extend our method to be applicable in low-resource and unsupervised settings. Since our compression method is training-free, it uses few computing resources and does not destroy the pre-trained knowledge of language models. Experimental results on the six widely-used datasets show that our proposed pruning method significantly outperforms baseline pruning methods. In addition, we demonstrate that our method preserves performance even in an unseen domain setting.