AS LG SD MLAug 17, 2020

Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks

Michał Romaniuk, Piotr Masztalski, Karol Piaskowski, Mateusz Matuszewski

arXiv:2008.07244v12.36 citations

Originality Incremental advance

AI Analysis

This work addresses computational limitations in mobile speech enhancement, but it is incremental as it builds on existing fully-convolutional architectures.

The paper tackles efficient low-latency speech enhancement for mobile devices by proposing Mobile Audio Streaming Networks (MASnet), which reduces fused multiply-accumulate operations per second by incorporating depthwise and pointwise convolutions, though at some cost to SNR.

We propose Mobile Audio Streaming Networks (MASnet) for efficient low-latency speech enhancement, which is particularly suitable for mobile devices and other applications where computational capacity is a limitation. MASnet processes linear-scale spectrograms, transforming successive noisy frames into complex-valued ratio masks which are then applied to the respective noisy frames. MASnet can operate in a low-latency incremental inference mode which matches the complexity of layer-by-layer batch mode. Compared to a similar fully-convolutional architecture, MASnet incorporates depthwise and pointwise convolutions for a large reduction in fused multiply-accumulate operations per second (FMA/s), at the cost of some reduction in SNR.

View on arXiv PDF

Similar