ASSDSep 7, 2020

Toward Speech Separation in The Pre-Cocktail Party Problem with TasTas

arXiv:2009.03692v41 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses speech separation for noisy environments, but it is incremental as it builds on existing methods like DPRNN-TasNet.

The paper tackles monaural speech separation in the pre-cocktail party problem using TasTas, achieving a 10.41dB SDR improvement on WSJ0-5mix data, which increases to 11.14dB with augmentation.

In this note, we propose to use TasTas \cite{shi2020speech} for the end-to-end approach to monaural speech separation in the pre-cocktail party problem. Our experiments on the public WSJ0-5mix data corpus results in 10.41dB SDR improvement. If online voice data remixing augmentation \cite{zeghidour2020wavesplit} is adopted in training, an 11.14dB SDR improvement can be achieved. We have open-sourced our re-implementation of the DPRNN-TasNet in https://github.com/ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation, and our TasTas is realized based on this implementation of DPRNN-TasNet, it is believed that the results in this paper can be reproduced with ease.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes