Feasibility of Time-Domain DNN-Based Speech Enhancement on Embedded FPGA for Hearing Aid
For hearing aid developers, this work provides concrete latency and memory measurements showing that DNN-based denoising is feasible on embedded FPGA, but speech separation still falls short of real-time requirements.
The paper evaluates the SuDoRM-RF++ speech enhancement model on an embedded FPGA (AMD-Xilinx Kria KV260) and shows that fixed-point denoising achieves 9.7 ms first-sample latency, meeting the 10 ms clinical threshold for hearing aids, while speech separation reaches 16.0 ms. Data movement is identified as the primary bottleneck.
Hearing aids impose strict latency and power constraints that current DNN-based speech enhancement systems struggle to meet on embedded hardware. We characterize this gap by deploying both speech separation and denoising using the lightweight SuDoRM-RF++ architecture on the AMD-Xilinx Kria KV260, evaluated at FP32 and 16-bit fixed-point precision for each task. Across these configurations, first-sample latency tracks with on-chip parameter caching rather than arithmetic throughput, identifying data movement as the primary bottleneck. Precision reduction halves the model memory footprint without compromising objective speech quality. The fixed-point denoising accelerator achieves a first-sample latency of 9.7~ms, meeting the 10~ms clinical threshold, while speech separation reaches 16.0~ms. These measurements establish concrete resource requirements for embedded DNN-based speech enhancement and quantify the remaining gap to hearing aid deployment.