SD AI CL LG MM ASOct 27, 2022

CasNet: Investigating Channel Robustness for Speech Separation

Fan-Lin Wang, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang

arXiv:2210.15370v15.73 citationsh-index: 46Has Code

Originality Incremental advance

AI Analysis

This work addresses channel robustness for speech separation, which is crucial for real-world applications, but it is incremental as it builds on existing TasNet and uses a previously constructed dataset.

The paper tackled the problem of channel mismatch in speech separation by proposing CasNet, a channel-aware audio separation network that incorporates channel embeddings using the FiLM technique, and it showed improved performance over the TasNet baseline on the TAT-2mix corpus.

Recording channel mismatch between training and testing conditions has been shown to be a serious problem for speech separation. This situation greatly reduces the separation performance, and cannot meet the requirement of daily use. In this study, inheriting the use of our previously constructed TAT-2mix corpus, we address the channel mismatch problem by proposing a channel-aware audio separation network (CasNet), a deep learning framework for end-to-end time-domain speech separation. CasNet is implemented on top of TasNet. Channel embedding (characterizing channel information in a mixture of multiple utterances) generated by Channel Encoder is introduced into the separation module by the FiLM technique. Through two training strategies, we explore two roles that channel embedding may play: 1) a real-life noise disturbance, making the model more robust, or 2) a guide, instructing the separation model to retain the desired channel information. Experimental results on TAT-2mix show that CasNet trained with both training strategies outperforms the TasNet baseline, which does not use channel embeddings.

View on arXiv PDF Code

Similar