W-Net BF: DNN-based Beamformer Using Joint Training Approach
This work addresses generalizability issues in audio signal enhancement for applications like speech processing, though it appears incremental as it builds on existing DNN-based beamforming approaches.
The paper tackled the problem of acoustic beamforming by proposing a W-Net beamformer that combines DNN-powered generalized eigenvalue and filter-estimation methods into a single framework, resulting in outperformance over other methods across all evaluation metrics on diverse room and noise conditions.
Acoustic beamformers have been widely used to enhance audio signals. The best current methods are DNN-powered variants of the generalized eigenvalue beamformer, and DNN-based filterestimation methods that directly compute beamforming filters. Both approaches, while effective, have blindspots in their generalizability. We propose a novel approach that combines both approaches into a single framework that attempts to exploit the best features of both. The resulting model, called a W-Net beamformer, includes two components: the first computes a noise-masked reference which the second uses to estimate beamforming filters. Results on data that include a wide variety of room and noise conditions, including static and mobile noise sources, show that the proposed beamformer outperforms other methods in all tested evaluation metrics.