LGAIAROct 10, 2021

A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization

arXiv:2110.04861v12 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient deep learning inference for edge computing, though it appears incremental as it builds on existing hardware acceleration and quantization techniques.

The paper tackled the problem of accelerating deep learning inference on edge devices by introducing a low-power MLP accelerator using pipelined matrix multiplication and non-uniform quantization, achieving better performance with reduced power consumption in tests on handwritten digit classification and Q-learning tasks.

Matrix multiplication is the bedrock in Deep Learning inference application. When it comes to hardware acceleration on edge computing devices, matrix multiplication often takes up a great majority of the time. To achieve better performance in edge computing, we introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology. The implementation is running on Field-programmable Gate Array (FPGA) devices and tested its performance on handwritten digit classification and Q-learning tasks. Results show that our method can achieve better performance with fewer power consumption.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes