ARApr 21

Energy Efficient LSTM Accelerators for Embedded FPGAs through Parameterised Architecture Design

arXiv:2604.1929322.57 citations
AI Analysis

It addresses the need for energy-efficient LSTM inference on resource-constrained embedded devices, but the gains are incremental over existing FPGA accelerators.

The paper proposes a parameterised hardware accelerator for LSTMs on embedded FPGAs, achieving 11.89 GOP/s/W energy efficiency with 32873 samples/s inference speed, outperforming prior work in speed and energy consumption.

Long Short-term Memory Networks (LSTMs) are a vital Deep Learning technique suitable for performing on-device time series analysis on local sensor data streams of embedded devices. In this paper, we propose a new hardware accelerator design for LSTMs specially optimised for resource-scarce embedded Field Programmable Gate Arrays (FPGAs). Our design improves the execution speed and reduces energy consumption compared to related work. Moreover, it can be adapted to different situations using a number of optimisation parameters, such as the usage of DSPs or the implementation of activation functions. We present our key design decisions and evaluate the performance. Our accelerator achieves an energy efficiency of 11.89 GOP/s/W during a real-time inference with 32873 samples/s.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes