SD LG ASSep 26, 2024

Towards Sub-millisecond Latency Real-Time Speech Enhancement Models on Hearables

Artem Dementyev, Chandan K. A. Reddy, Scott Wisdom, Navin Chatlani, John R. Hershey, Richard F. Lyon

DeepMind

arXiv:2409.18239v24.95 citationsh-index: 53

Originality Incremental advance

AI Analysis

This work addresses low-latency speech enhancement for hearing aids and hearables, offering incremental improvements in latency and performance for this domain.

The paper tackled the problem of achieving sub-millisecond latency for real-time speech enhancement on resource-constrained hearables, resulting in a mean algorithmic latency of 0.32 ms to 1.25 ms and a mean SI-SDRi of 4.1 dB with a single microphone.

Low latency models are critical for real-time speech enhancement applications, such as hearing aids and hearables. However, the sub-millisecond latency space for resource-constrained hearables remains underexplored. We demonstrate speech enhancement using a computationally efficient minimum-phase FIR filter, enabling sample-by-sample processing to achieve mean algorithmic latency of 0.32 ms to 1.25 ms. With a single microphone, we observe a mean SI-SDRi of 4.1 dB. The approach shows generalization with a DNSMOS increase of 0.2 on unseen audio recordings. We use a lightweight LSTM-based model of 626k parameters to generate FIR taps. Using a real hardware implementation on a low-power DSP, our system can run with 376 MIPS and a mean end-to-end latency of 3.35 ms. In addition, we provide a comparison with existing low-latency spectral masking techniques. We hope this work will enable a better understanding of latency and can be used to improve the comfort and usability of hearables.

View on arXiv PDF

Similar