LGJan 2, 2023

SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

arXiv:2301.00774v31310 citationsh-index: 41Has Code

Originality Incremental advance

AI Analysis

This addresses the computational and memory bottlenecks for deploying massive language models, though it is incremental as it builds on existing pruning techniques.

The paper tackles the problem of pruning large GPT models efficiently by introducing SparseGPT, a method that prunes models like OPT-175B to 50-60% sparsity in one shot without retraining, achieving this in under 4.5 hours with negligible perplexity increase.

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.

View on arXiv PDF Code

Similar