AS SDMay 19, 2020

Atss-Net: Target Speaker Separation via Attention-based Neural Network

Tingle Li, Qingjian Lin, Yuanyuan Bao, Ming Li

arXiv:2005.09200v18.041 citations

Originality Incremental advance

AI Analysis

This work addresses speaker separation for audio processing applications, representing an incremental improvement over existing methods.

The paper tackles target speaker separation by proposing Atss-Net, an attention-based neural network in the spectrogram domain, which outperforms VoiceFilter with half the parameters and shows promise in speech enhancement.

Recently, Convolutional Neural Network (CNN) and Long short-term memory (LSTM) based models have been introduced to deep learning-based target speaker separation. In this paper, we propose an Attention-based neural network (Atss-Net) in the spectrogram domain for the task. It allows the network to compute the correlation between each feature parallelly, and using shallower layers to extract more features, compared with the CNN-LSTM architecture. Experimental results show that our Atss-Net yields better performance than the VoiceFilter, although it only contains half of the parameters. Furthermore, our proposed model also demonstrates promising performance in speech enhancement.

View on arXiv PDF

Similar