Lightweight Speech Enhancement in Unseen Noisy and Reverberant Conditions using KISS-GEV Beamforming
This provides a lightweight solution for real-time speech enhancement in applications like hearing aids or communication systems, though it is incremental as it builds on existing beamforming methods.
The paper tackles speech enhancement in unseen noisy and reverberant conditions by proposing KISS-GEV beamforming, which uses direction of arrival instead of neural networks, reducing computational load and outperforming traditional Delay-and-Sum beamforming.
This paper introduces a new method referred to as KISS-GEV (for Keep It Super Simple Generalized eigenvalue) beamforming. While GEV beamforming usually relies on deep neural network for estimating target and noise time-frequency masks, this method uses a signal processing approach based on the direction of arrival (DoA) of the target. This considerably reduces the amount of computations involved at test time, and works for speech enhancement in unseen conditions as there is no need to train a neural network with noisy speech. The proposed method can also be used to separate speech from a mixture, provided the speech sources come from different directions. Results also show that the proposed method uses the same minimal DoA assumption as Delay-and-Sum beamforming, yet outperforms this traditional approach.