LGMLSep 28, 2025

Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization

arXiv:2509.23898v31 citationsh-index: 15
Originality Highly original
AI Analysis

This work addresses the challenge of compacting neural networks efficiently for practitioners, offering a theoretically grounded and versatile solution, though it is incremental as it builds on structured sparsity regularization.

The paper tackles the problem of structured sparsity regularization in neural networks, which is non-differentiable and incompatible with standard gradient descent, by proposing D-Gating, a differentiable overparameterization method that achieves strong performance-sparsity tradeoffs across vision, language, and tabular tasks, outperforming existing methods.

Structured sparsity regularization offers a principled way to compact neural networks, but its non-differentiability breaks compatibility with conventional stochastic gradient descent and requires either specialized optimizers or additional post-hoc pruning without formal guarantees. In this work, we propose $D$-Gating, a fully differentiable structured overparameterization that splits each group of weights into a primary weight vector and multiple scalar gating factors. We prove that any local minimum under $D$-Gating is also a local minimum using non-smooth structured $L_{2,2/D}$ penalization, and further show that the $D$-Gating objective converges at least exponentially fast to the $L_{2,2/D}$-regularized loss in the gradient flow limit. Together, our results show that $D$-Gating is theoretically equivalent to solving the original group sparsity problem, yet induces distinct learning dynamics that evolve from a non-sparse regime into sparse optimization. We validate our theory across vision, language, and tabular tasks, where $D$-Gating consistently delivers strong performance-sparsity tradeoffs and outperforms both direct optimization of structured penalties and conventional pruning baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes