AR LGSep 15, 2023

A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

Longwei Huang, Chao Fang, Qiong Li, Jun Lin, Zhongfeng Wang

arXiv:2309.08186v110.325 citationsh-index: 25

Originality Incremental advance

AI Analysis

This enables efficient and private on-device learning for applications like in-vehicle smart devices, though it is incremental as it builds on existing hardware methods.

The paper tackles the challenge of deploying quantized deep neural networks on extreme edge devices by proposing a precision-scalable RISC-V processor that supports fixed-point inference from 2-bit to 16-bit and FP16 on-device learning, achieving up to 14.6× higher inference throughput and 16.5× higher FP throughput compared to prior work.

Extreme edge platforms, such as in-vehicle smart devices, require efficient deployment of quantized deep neural networks (DNNs) to enable intelligent applications with limited amounts of energy, memory, and computing resources. However, many edge devices struggle to boost inference throughput of various quantized DNNs due to the varying quantization levels, and these devices lack floating-point (FP) support for on-device learning, which prevents them from improving model accuracy while ensuring data privacy. To tackle the challenges above, we propose a precision-scalable RISC-V DNN processor with on-device learning capability. It facilitates diverse precision levels of fixed-point DNN inference, spanning from 2-bit to 16-bit, and enhances on-device learning through improved support with FP16 operations. Moreover, we employ multiple methods such as FP16 multiplier reuse and multi-precision integer multiplier reuse, along with balanced mapping of FPGA resources, to significantly improve hardware resource utilization. Experimental results on the Xilinx ZCU102 FPGA show that our processor significantly improves inference throughput by 1.6$\sim$14.6$\times$ and energy efficiency by 1.1$\sim$14.6$\times$ across various DNNs, compared to the prior art, XpulpNN. Additionally, our processor achieves a 16.5$\times$ higher FP throughput for on-device learning.

View on arXiv PDF

Similar