VUSA: Virtually Upscaled Systolic Array Architecture to Exploit Unstructured Sparsity in AI Acceleration
This addresses the need for efficient AI accelerators, particularly for Edge-AI applications, by providing an application-independent solution that works with any sparsity level, though it is incremental as it builds on existing systolic-array methods.
The paper tackles the problem of inefficient deep neural network acceleration by introducing VUSA, a systolic-array architecture that exploits unstructured sparsity to virtually grow and perform larger matrix multiplications with the same physical units, achieving 37% area savings and 68% power efficiency improvements compared to a baseline.
Leveraging high degrees of unstructured sparsity is a promising approach to enhance the efficiency of deep neural network DNN accelerators - particularly important for emerging Edge-AI applications. We introduce VUSA, a systolic-array architecture that virtually grows based on the present sparsity to perform larger matrix multiplications with the same number of physical multiply-accumulate MAC units. The proposed architecture achieves saving by 37% and 68% in area and power efficiency, respectively, at the same peak-performance, compared to a baseline systolic array architecture in a commercial 16-nm technology. Still, the proposed architecture supports acceleration for any DNN with any sparsity - even no sparsity at all. Thus, the proposed architecture is application-independent, making it viable for general-purpose AI acceleration.