AR LGJan 21, 2021

Direct Spatial Implementation of Sparse Matrix Multipliers for Reservoir Computing

arXiv:2101.08884v1

Originality Incremental advance

AI Analysis

This work addresses the bottleneck of sparse matrix multiplication in reservoir computing systems, offering significant latency and power improvements, though it is incremental as it builds on existing spatial implementation techniques.

The paper tackles the problem of accelerating sparse matrix multiplication for reservoir computing by proposing a direct spatial implementation using bit-serial arithmetic and canonical signed digit representation, resulting in latency reductions of 50x to 86x compared to GPU libraries and 4.1x to 47x compared to a sparse DNN accelerator.

Reservoir computing systems rely on the recurrent multiplication of a very large, sparse, fixed matrix. We argue that direct spatial implementation of these fixed matrices minimizes the work performed in the computation, and allows for significant reduction in latency and power through constant propagation and logic minimization. Bit-serial arithmetic enables massive static matrices to be implemented. We present the structure of our bit-serial matrix multiplier, and evaluate using canonical signed digit representation to further reduce logic utilization. We have implemented these matrices on a large FPGA and provide a cost model that is simple and extensible. These FPGA implementations, on average, reduce latency by 50x up to 86x versus GPU libraries. Comparing against a recent sparse DNN accelerator, we measure a 4.1x to 47x reduction in latency depending on matrix dimension and sparsity. Throughput of the FPGA solution is also competitive for a wide range of matrix dimensions and batch sizes. Finally, we discuss ways these techniques could be deployed in ASICs, making them applicable for dynamic sparse matrix computations.

View on arXiv PDF

Similar