CVAILGFeb 25, 2025

Optimal Brain Apoptosis

arXiv:2502.17941v23 citationsh-index: 9Has CodeICLR
Originality Incremental advance
AI Analysis

This work addresses the problem of reducing computational demands for practitioners using CNNs and Transformers, though it is incremental as it builds on the foundational Optimal Brain Damage method.

The paper tackles the computational inefficiency of large neural networks by introducing Optimal Brain Apoptosis (OBA), a pruning method that directly calculates Hessian-vector products for precise parameter removal, achieving competitive performance on datasets like CIFAR10, CIFAR100, and ImageNet with models such as VGG19 and ResNet50.

The increasing complexity and parameter count of Convolutional Neural Networks (CNNs) and Transformers pose challenges in terms of computational efficiency and resource demands. Pruning has been identified as an effective strategy to address these challenges by removing redundant elements such as neurons, channels, or connections, thereby enhancing computational efficiency without heavily compromising performance. This paper builds on the foundational work of Optimal Brain Damage (OBD) by advancing the methodology of parameter importance estimation using the Hessian matrix. Unlike previous approaches that rely on approximations, we introduce Optimal Brain Apoptosis (OBA), a novel pruning method that calculates the Hessian-vector product value directly for each parameter. By decomposing the Hessian matrix across network layers and identifying conditions under which inter-layer Hessian submatrices are non-zero, we propose a highly efficient technique for computing the second-order Taylor expansion of parameters. This approach allows for a more precise pruning process, particularly in the context of CNNs and Transformers, as validated in our experiments including VGG19, ResNet32, ResNet50, and ViT-B/16 on CIFAR10, CIFAR100 and Imagenet datasets. Our code is available at https://github.com/NEU-REAL/OBA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes