SDLGASSep 26, 2024

Towards Sub-millisecond Latency Real-Time Speech Enhancement Models on Hearables

DeepMind
arXiv:2409.18239v25 citationsh-index: 53
Originality Incremental advance
AI Analysis

This work addresses low-latency speech enhancement for hearing aids and hearables, offering incremental improvements in latency and performance for this domain.

The paper tackled the problem of achieving sub-millisecond latency for real-time speech enhancement on resource-constrained hearables, resulting in a mean algorithmic latency of 0.32 ms to 1.25 ms and a mean SI-SDRi of 4.1 dB with a single microphone.

Low latency models are critical for real-time speech enhancement applications, such as hearing aids and hearables. However, the sub-millisecond latency space for resource-constrained hearables remains underexplored. We demonstrate speech enhancement using a computationally efficient minimum-phase FIR filter, enabling sample-by-sample processing to achieve mean algorithmic latency of 0.32 ms to 1.25 ms. With a single microphone, we observe a mean SI-SDRi of 4.1 dB. The approach shows generalization with a DNSMOS increase of 0.2 on unseen audio recordings. We use a lightweight LSTM-based model of 626k parameters to generate FIR taps. Using a real hardware implementation on a low-power DSP, our system can run with 376 MIPS and a mean end-to-end latency of 3.35 ms. In addition, we provide a comparison with existing low-latency spectral masking techniques. We hope this work will enable a better understanding of latency and can be used to improve the comfort and usability of hearables.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes