SDAILGASSep 5, 2024

aTENNuate: Optimized Real-time Speech Enhancement with Deep SSMs on Raw Audio

arXiv:2409.03377v43 citationsh-index: 3
AI Analysis

This work addresses speech enhancement for real-time applications, offering incremental improvements in efficiency and performance for noisy audio processing.

The authors tackled real-time speech enhancement by introducing aTENNuate, a deep state-space autoencoder for raw audio processing, which outperformed previous real-time models in PESQ score, parameter count, MACs, and latency, and showed effectiveness in low-resource settings like 4000Hz and 4-bit compression.

We present aTENNuate, a simple deep state-space autoencoder configured for efficient online raw speech enhancement in an end-to-end fashion. The network's performance is primarily evaluated on raw speech denoising, with additional assessments on tasks such as super-resolution and de-quantization. We benchmark aTENNuate on the VoiceBank + DEMAND and the Microsoft DNS1 synthetic test sets. The network outperforms previous real-time denoising models in terms of PESQ score, parameter count, MACs, and latency. Even as a raw waveform processing model, the model maintains high fidelity to the clean signal with minimal audible artifacts. In addition, the model remains performant even when the noisy input is compressed down to 4000Hz and 4 bits, suggesting general speech enhancement capabilities in low-resource environments. Try it out by pip install attenuate

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes