Linjuan Cheng

SD
3papers
5citations
Novelty58%
AI Score23

3 Papers

ASFeb 3, 2022
A deep complex multi-frame filtering network for stereophonic acoustic echo cancellation

Linjuan Cheng, Chengshi Zheng, Andong Li et al.

In hands-free communication system, the coupling between loudspeaker and microphone generates echo signal, which can severely influence the quality of communication. Meanwhile, various types of noise in communication environments further reduce speech quality and intelligibility. It is difficult to extract the near-end signal from the microphone signal within one step, especially in low signal-to-noise ratio scenarios. In this paper, we propose a deep complex network approach to address this issue. Specially, we decompose the stereophonic acoustic echo cancellation into two stages, including linear stereophonic acoustic echo cancellation module and residual echo suppression module, where both modules are based on deep learning architectures. A multi-frame filtering strategy is introduced to benefit the estimation of linear echo by capturing more inter-frame information. Moreover, we decouple the complex spectral mapping into magnitude estimation and complex spectrum refinement. Experimental results demonstrate that our proposed approach achieves stage-of-the-art performance over previous advanced algorithms under various conditions.

SDMay 12, 2020
The IOA System for Deep Noise Suppression Challenge using a Framework Combining Dynamic Attention and Recursive Learning

Andong Li, Chengshi Zheng, Renhua Peng et al.

This technical report describes our system that is submitted to the Deep Noise Suppression Challenge and presents the results for the non-real-time track. To refine the estimation results stage by stage, we utilize recursive learning, a type of training protocol which aggravates the information through multiple stages with a memory mechanism. The attention generator network is designed to dynamically control the feature distribution of the noise reduction network. To improve the phase recovery accuracy, we take the complex spectral mapping procedure by decoding both real and imaginary spectra. For the final blind test set, the average MOS improvements of the submitted system in noreverb, reverb, and realrec categories are 0.49, 0.24, and 0.36, respectively.

SDMar 22, 2020
A Time-domain Monaural Speech Enhancement with Feedback Learning

Andong Li, Chengshi Zheng, Linjuan Cheng et al.

In this paper, we propose a type of neural network with feedback learning in the time domain called FTNet for monaural speech enhancement, where the proposed network consists of three principal components. The first part is called stage recurrent neural network, which is introduced to effectively aggregate the deep feature dependencies across different stages with a memory mechanism and also remove the interference stage by stage. The second part is the convolutional auto-encoder. The third part consists of a series of concatenated gated linear units, which are capable of facilitating the information flow and gradually increasing the receptive fields. Feedback learning is adopted to improve the parameter efficiency and therefore, the number of trainable parameters is effectively reduced without sacrificing its performance. Numerous experiments are conducted on TIMIT corpus and experimental results demonstrate that the proposed network can achieve consistently better performance in terms of both PESQ and STOI scores than two state-of-the-art time domain-based baselines in different conditions.