SD LG ASAug 5, 2025

TF-MLPNet: Tiny Real-Time Neural Speech Separation

Malek Itani, Tuochao Chen, Shyamnath Gollakota

arXiv:2508.03047v14 citationsh-index: 49The 6th Clarity Workshop on Improving Speech-in-Noise for Hearing Devices (Clarity-2025)

Originality Incremental advance

AI Analysis

This enables transformative augmented hearing capabilities on tiny devices, addressing a specific bottleneck for hearable technology.

The paper tackles the problem of real-time speech separation on low-power hearable devices by introducing TF-MLPNet, which processes 6 ms audio chunks on the GAP9 processor with a 3.5-4x runtime reduction compared to prior models while outperforming existing streaming models.

Speech separation on hearable devices can enable transformative augmented and enhanced hearing capabilities. However, state-of-the-art speech separation networks cannot run in real-time on tiny, low-power neural accelerators designed for hearables, due to their limited compute capabilities. We present TF-MLPNet, the first speech separation network capable of running in real-time on such low-power accelerators while outperforming existing streaming models for blind speech separation and target speech extraction. Our network operates in the time-frequency domain, processing frequency sequences with stacks of fully connected layers that alternate along the channel and frequency dimensions, and independently processing the time sequence at each frequency bin using convolutional layers. Results show that our mixed-precision quantization-aware trained (QAT) model can process 6 ms audio chunks in real-time on the GAP9 processor, achieving a 3.5-4x runtime reduction compared to prior speech separation models.

View on arXiv PDF

Similar