ARApr 29

Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators

arXiv:2604.2658716.1

AI Analysis

For hardware designers of neural network accelerators, this work challenges the conventional wisdom that specialized sparse accelerators are necessary, demonstrating that dense accelerators can be more efficient for sparse networks.

The paper shows that using a larger number of dense processing elements (PEs) for sparse neural network computation is more area- and energy-efficient than using dedicated sparse PEs, and proposes Sparse-on-Dense, a method that leverages dense matrix multiplication accelerators for sparse networks.

As the size of Deep Neural Networks (DNNs) increases dramatically to achieve high accuracy, the DNNs require a large amount of computations and memory footprint. Pruning, which produces a sparse neural network, is one of the solutions to reduce the computational complexity of neural network processing. To maximize the performance of the computations with such compressed data, dedicated sparse neural network accelerators have been introduced, but complex circuits for matching the indices of non-zero inputs/weights cause large overhead in area and power of processing elements (PEs). The sparse PE becomes significantly larger than the dense PE, which raises an interesting question for designers; "Given the area, isn't it better to use larger number of dense PEs despite the low utilization in sparse matrix computations?" In this paper, we show that the answer is "yes", and demonstrate an area and energy-efficient method for sparse neural network computing on dense-matrix multiplication hardware accelerators (Sparse-on-Dense).

View on arXiv PDF

Similar