LGAICVMay 18, 2023

PDP: Parameter-free Differentiable Pruning is All You Need

arXiv:2305.11203v319 citations
Originality Incremental advance
AI Analysis

This addresses the need for simpler and more effective pruning methods to reduce model size and improve efficiency for vision and language tasks, though it appears incremental as it builds on existing differentiable pruning approaches.

The paper tackles the problem of DNN pruning by proposing Parameter-free Differentiable Pruning (PDP), an efficient train-time pruning scheme that achieves state-of-the-art results in model size, accuracy, and training cost across various vision and language tasks, such as improving MobileNet-v1 accuracy by 1.7% at 86.6% sparsity and BERT accuracy by 1.6% at 90% sparsity.

DNN pruning is a popular way to reduce the size of a model, improve the inference latency, and minimize the power consumption on DNN accelerators. However, existing approaches might be too complex, expensive or ineffective to apply to a variety of vision/language tasks, DNN architectures and to honor structured pruning constraints. In this paper, we propose an efficient yet effective train-time pruning scheme, Parameter-free Differentiable Pruning (PDP), which offers state-of-the-art qualities in model size, accuracy, and training cost. PDP uses a dynamic function of weights during training to generate soft pruning masks for the weights in a parameter-free manner for a given pruning target. While differentiable, the simplicity and efficiency of PDP make it universal enough to deliver state-of-the-art random/structured/channel pruning results on various vision and natural language tasks. For example, for MobileNet-v1, PDP can achieve 68.2% top-1 ImageNet1k accuracy at 86.6% sparsity, which is 1.7% higher accuracy than those from the state-of-the-art algorithms. Also, PDP yields over 83.1% accuracy on Multi-Genre Natural Language Inference with 90% sparsity for BERT, while the next best from the existing techniques shows 81.5% accuracy. In addition, PDP can be applied to structured pruning, such as N:M pruning and channel pruning. For 1:4 structured pruning of ResNet18, PDP improved the top-1 ImageNet1k accuracy by over 3.6% over the state-of-the-art. For channel pruning of ResNet50, PDP reduced the top-1 ImageNet1k accuracy by 0.6% from the state-of-the-art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes