LG AIFeb 17, 2025

Fishing For Cheap And Efficient Pruners At Initialization

Ivo Gollini Navarrete, Nicolas Mauricio Cuadrado, Jose Renato Restom, Martin Takáč, Samuel Horváth

arXiv:2502.11450v17.11 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of deploying large deep neural networks in resource-constrained settings by providing a more efficient pruning method, though it is incremental as it builds on existing saliency and sensitivity principles.

The paper tackles the problem of pruning neural networks before training to reduce computational costs, introducing Fisher-Taylor Sensitivity (FTS) as a cheap and efficient criterion that achieves competitive performance against state-of-the-art methods on benchmarks like ResNet18 and VGG19 with CIFAR datasets, even under extreme sparsity conditions.

Pruning offers a promising solution to mitigate the associated costs and environmental impact of deploying large deep neural networks (DNNs). Traditional approaches rely on computationally expensive trained models or time-consuming iterative prune-retrain cycles, undermining their utility in resource-constrained settings. To address this issue, we build upon the established principles of saliency (LeCun et al., 1989) and connection sensitivity (Lee et al., 2018) to tackle the challenging problem of one-shot pruning neural networks (NNs) before training (PBT) at initialization. We introduce Fisher-Taylor Sensitivity (FTS), a computationally cheap and efficient pruning criterion based on the empirical Fisher Information Matrix (FIM) diagonal, offering a viable alternative for integrating first- and second-order information to identify a model's structurally important parameters. Although the FIM-Hessian equivalency only holds for convergent models that maximize the likelihood, recent studies (Karakida et al., 2019) suggest that, even at initialization, the FIM captures essential geometric information of parameters in overparameterized NNs, providing the basis for our method. Finally, we demonstrate empirically that layer collapse, a critical limitation of data-dependent pruning methodologies, is easily overcome by pruning within a single training epoch after initialization. We perform experiments on ResNet18 and VGG19 with CIFAR-10 and CIFAR-100, widely used benchmarks in pruning research. Our method achieves competitive performance against state-of-the-art techniques for one-shot PBT, even under extreme sparsity conditions. Our code is made available to the public.

View on arXiv PDF Code

Similar