Revisiting Large Language Model Pruning using Neuron Semantic Attribution
This work identifies limitations in pruning methods for large language models, which is important for researchers and practitioners seeking efficient model deployment, though it is incremental as it builds on existing pruning techniques.
The study evaluated existing pruning methods on 24 datasets and 4 tasks, finding that calibration sets significantly affect performance and observing a notable performance drop in sentiment classification tasks. To address this, they proposed Neuron Semantic Attribution to link pruned neurons to specific semantics, making unpruned neurons explainable.
Model pruning technique is vital for accelerating large language models by reducing their size and computational requirements. However, the generalizability of existing pruning methods across diverse datasets and tasks remains unclear. Thus, we conduct extensive evaluations on 24 datasets and 4 tasks using popular pruning methods. Based on these evaluations, we find and then investigate that calibration set greatly affect the performance of pruning methods. In addition, we surprisingly find a significant performance drop of existing pruning methods in sentiment classification tasks. To understand the link between performance drop and pruned neurons, we propose Neuron Semantic Attribution, which learns to associate each neuron with specific semantics. This method first makes the unpruned neurons of LLMs explainable.