SDASMar 27, 2021

On TasNet for Low-Latency Single-Speaker Speech Enhancement

arXiv:2103.14882v12 citations
Originality Incremental advance
AI Analysis

This work addresses speech enhancement for applications requiring noise reduction, but it is incremental as it extends an existing method to a related task.

The paper tackles the problem of applying the time-domain audio separation network (TasNet) to single-speaker speech enhancement, showing that it improves state-of-the-art performance, with the largest gains for modulated noise sources like speech, and consistently outperforms existing systems.

In recent years, speech processing algorithms have seen tremendous progress primarily due to the deep learning renaissance. This is especially true for speech separation where the time-domain audio separation network (TasNet) has led to significant improvements. However, for the related task of single-speaker speech enhancement, which is of obvious importance, it is yet unknown, if the TasNet architecture is equally successful. In this paper, we show that TasNet improves state-of-the-art also for speech enhancement, and that the largest gains are achieved for modulated noise sources such as speech. Furthermore, we show that TasNet learns an efficient inner-domain representation, where target and noise signal components are highly separable. This is especially true for noise in terms of interfering speech signals, which might explain why TasNet performs so well on the separation task. Additionally, we show that TasNet performs poorly for large frame hops and conjecture that aliasing might be the main cause of this performance drop. Finally, we show that TasNet consistently outperforms a state-of-the-art single-speaker speech enhancement system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes