SD ASDec 4, 2021

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Xiaolin Hu, Kai Li, Weiyi Zhang, Yi Luo, Jean-Marie Lemercier, Timo Gerkmann

arXiv:2112.02321v119.866 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses speech separation, a key problem in audio processing for applications like hearing aids and voice assistants, with an incremental improvement in model architecture.

The paper tackled speech separation by proposing an asynchronous updating scheme for a Fully Recurrent Convolutional Neural Network, achieving significantly better results with fewer parameters and a good balance between accuracy and efficiency on three benchmark datasets.

Recent advances in the design of neural network architectures, in particular those specialized in modeling sequences, have provided significant improvements in speech separation performance. In this work, we propose to use a bio-inspired architecture called Fully Recurrent Convolutional Neural Network (FRCNN) to solve the separation task. This model contains bottom-up, top-down and lateral connections to fuse information processed at various time-scales represented by \textit{stages}. In contrast to the traditional approach updating stages in parallel, we propose to first update the stages one by one in the bottom-up direction, then fuse information from adjacent stages simultaneously and finally fuse information from all stages to the bottom stage together. Experiments showed that this asynchronous updating scheme achieved significantly better results with much fewer parameters than the traditional synchronous updating scheme. In addition, the proposed model achieved good balance between speech separation accuracy and computational efficiency as compared to other state-of-the-art models on three benchmark datasets.

View on arXiv PDF Code

Similar