SD ASMar 27, 2021

On TasNet for Low-Latency Single-Speaker Speech Enhancement

Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Jesper Jensen

arXiv:2103.14882v12.32 citations

Originality Incremental advance

AI Analysis

This work addresses speech enhancement for applications requiring noise reduction, but it is incremental as it extends an existing method to a related task.

The paper tackles the problem of applying the time-domain audio separation network (TasNet) to single-speaker speech enhancement, showing that it improves state-of-the-art performance, with the largest gains for modulated noise sources like speech, and consistently outperforms existing systems.

In recent years, speech processing algorithms have seen tremendous progress primarily due to the deep learning renaissance. This is especially true for speech separation where the time-domain audio separation network (TasNet) has led to significant improvements. However, for the related task of single-speaker speech enhancement, which is of obvious importance, it is yet unknown, if the TasNet architecture is equally successful. In this paper, we show that TasNet improves state-of-the-art also for speech enhancement, and that the largest gains are achieved for modulated noise sources such as speech. Furthermore, we show that TasNet learns an efficient inner-domain representation, where target and noise signal components are highly separable. This is especially true for noise in terms of interfering speech signals, which might explain why TasNet performs so well on the separation task. Additionally, we show that TasNet performs poorly for large frame hops and conjecture that aliasing might be the main cause of this performance drop. Finally, we show that TasNet consistently outperforms a state-of-the-art single-speaker speech enhancement system.

View on arXiv PDF

Similar