Joint NN-Supported Multichannel Reduction of Acoustic Echo, Reverberation and Noise
This work addresses the challenge of combined distortion reduction for smart speaker applications, representing an incremental improvement over existing methods.
The paper tackles the problem of simultaneously reducing acoustic echo, reverberation, and noise in real scenarios by proposing a joint optimization approach that models target and residual signals using a multichannel Gaussian framework and neural network spectral representation, resulting in outperforming cascaded and non-spectral joint methods in terms of overall distortion.
We consider the problem of simultaneous reduction of acoustic echo, reverberation and noise. In real scenarios, these distortion sources may occur simultaneously and reducing them implies combining the corresponding distortion-specific filters. As these filters interact with each other, they must be jointly optimized. We propose to model the target and residual signals after linear echo cancellation and dereverberation using a multichannel Gaussian modeling framework and to jointly represent their spectra by means of a neural network. We develop an iterative block-coordinate ascent algorithm to update all the filters. We evaluate our system on real recordings of acoustic echo, reverberation and noise acquired with a smart speaker in various situations. The proposed approach outperforms in terms of overall distortion a cascade of the individual approaches and a joint reduction approach which does not rely on a spectral model of the target and residual signals.