Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators
This addresses the problem of hardware compatibility and efficiency for AI practitioners using sparse neural networks, offering a practical solution that is incremental in improving existing structured sparse accelerators.
The paper tackles the inflexibility of structured sparsity in deep neural network accelerators by proposing an approximation method to convert unstructured sparsity into structured forms, enabling acceleration without fine-tuning and achieving up to 83% energy-delay-product improvement and 39% speed-up on real hardware.
Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support, but it provides limited flexibility and requires extra model fine-tuning. Moreover, any sparse model fine-tuned for certain structured sparse HW cannot be accelerated by other structured hardware. To enable acceleration using unstructured sparsity of DNNs on structured sparse hardware, we propose an approximation method leveraging the distributive property in linear algebra to turn any sparse tensor into a series of structured sparse tensors. We also develop a software framework, TASDER, to apply high-quality structured approximation on weights and activations of DNNs. Our method accelerates dense and sparse DNNs without fine-tuning and improves energy-delay-product (EDP) by up to 83% and 74%. It achieves up to 39% speed-up on a real system.