LGOct 16, 2025

Efficient Dynamic Structured Sparse Training with Learned Shuffles

arXiv:2510.14812v1h-index: 29
Originality Highly original
AI Analysis

This work addresses the efficiency-accuracy trade-off in sparse training for deep learning practitioners, offering a novel hybrid approach that is incremental but impactful.

The paper tackled the problem of structured sparsity in neural networks trailing unstructured methods in accuracy by proposing a method that learns permutations to enhance expressivity, achieving comparable accuracy to unstructured baselines at 90-95% sparsity on ImageNet-1K and WikiText-103 while training up to 1.21x and inferring up to 2.9x faster.

Structured sparsity accelerates training and inference on modern GPUs, yet it still trails unstructured dynamic sparse training (DST) in accuracy. The shortfall stems from a loss of expressivity: whereas a dense layer can realize every possible mask obtained by choosing any $w$ active weights out of $n$, a fixed block or N:M layout explores only a subset of those possibilities. We propose to close this gap by learning, for each layer, a single permutation matrix jointly with the structured weight matrix. Applied to three canonical structures -- block, N:M, and diagonals -- we show that permutation-augmented DST (PA-DST) matches unstructured baselines (RigL, SET) at 90--95\% sparsity on ImageNet-1K (ViT-B/16) and WikiText-103 (GPT-2), yet trains up to $1.21\times$ and infers up to $2.9\times$ faster. The results position structure + learned permutation as a sweet spot between accuracy and efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes