Energy Efficient Exact and Approximate Systolic Array Architecture for Matrix Multiplication
This work addresses energy efficiency for edge and vision processing applications, though it is incremental as it builds on existing systolic array designs with new processing elements.
The paper tackled the problem of energy inefficiency in matrix multiplication for deep neural networks by proposing a systolic array architecture with novel exact and approximate processing elements, achieving energy savings of 22% and 32% compared to existing designs while maintaining competitive output quality in applications like DCT and edge detection.
Deep Neural Networks (DNNs) require highly efficient matrix multiplication engines for complex computations. This paper presents a systolic array architecture incorporating novel exact and approximate processing elements (PEs), designed using energy-efficient positive partial product and negative partial product cells, termed as PPC and NPPC, respectively. The proposed 8-bit exact and approximate PE designs are employed in a 8x8 systolic array, which achieves a energy savings of 22% and 32%, respectively, compared to the existing design. To demonstrate their effectiveness, the proposed PEs are integrated into a systolic array (SA) for Discrete Cosine Transform (DCT) computation, achieving high output quality with a PSNR of 38.21,dB. Furthermore, in an edge detection application using convolution, the approximate PE achieves a PSNR of 30.45,dB. These results highlight the potential of the proposed design to deliver significant energy efficiency while maintaining competitive output quality, making it well-suited for error-resilient image and vision processing applications.