LGFeb 27, 2024

SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization

arXiv:2402.17902v21 citationsh-index: 61NIPS
AI Analysis

This addresses the problem of creating scalable and interpretable models for machine learning practitioners, but it is incremental as it builds on prior work in pruning.

The paper tackles neural network pruning by uniting differentiable pruning with combinatorial optimization to select important sparse parameters, achieving state-of-the-art results in block-wise pruning on ImageNet and Criteo datasets.

Neural network pruning is a key technique towards engineering large yet scalable, interpretable, and generalizable models. Prior work on the subject has developed largely along two orthogonal directions: (1) differentiable pruning for efficiently and accurately scoring the importance of parameters, and (2) combinatorial optimization for efficiently searching over the space of sparse models. We unite the two approaches, both theoretically and empirically, to produce a coherent framework for structured neural network pruning in which differentiable pruning guides combinatorial optimization algorithms to select the most important sparse set of parameters. Theoretically, we show how many existing differentiable pruning techniques can be understood as nonconvex regularization for group sparse optimization, and prove that for a wide class of nonconvex regularizers, the global optimum is unique, group-sparse, and provably yields an approximate solution to a sparse convex optimization problem. The resulting algorithm that we propose, SequentialAttention++, advances the state of the art in large-scale neural network block-wise pruning tasks on the ImageNet and Criteo datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes