LGJun 13, 2025

Dynamic Sparse Training of Diagonally Sparse Networks

arXiv:2506.11449v14 citationsh-index: 29Has CodeICML
Originality Incremental advance
AI Analysis

This addresses the hardware inefficiency of sparse neural networks for practitioners, offering a structured approach that is incremental over existing dynamic sparse training methods.

The paper tackles the problem of unstructured sparsity in neural networks not translating to practical speedups on hardware by proposing DynaDiag, a structured sparse-to-sparse training method that enforces diagonal sparsity, achieving up to 3.13x inference speedup and 1.59x training speedup on GPUs while maintaining accuracy.

Recent advances in Dynamic Sparse Training (DST) have pushed the frontier of sparse neural network training in structured and unstructured contexts, matching dense-model performance while drastically reducing parameter counts to facilitate model scaling. However, unstructured sparsity often fails to translate into practical speedups on modern hardware. To address this shortcoming, we propose DynaDiag, a novel structured sparse-to-sparse DST method that performs at par with unstructured sparsity. DynaDiag enforces a diagonal sparsity pattern throughout training and preserves sparse computation in forward and backward passes. We further leverage the diagonal structure to accelerate computation via a custom CUDA kernel, rendering the method hardware-friendly. Empirical evaluations on diverse neural architectures demonstrate that our method maintains accuracy on par with unstructured counterparts while benefiting from tangible computational gains. Notably, with 90% sparse linear layers in ViTs, we observe up to a 3.13x speedup in online inference without sacrificing model performance and a 1.59x speedup in training on a GPU compared to equivalent unstructured layers. Our source code is available at https://github.com/horizon-research/DynaDiag/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes