ASSDNov 17, 2020

Implicit Filter-and-sum Network for Multi-channel Speech Separation

arXiv:2011.08401v15 citations
AI Analysis

This work addresses speech separation for applications using ad-hoc or fixed microphone arrays, representing an incremental improvement over existing methods.

The paper tackles multi-channel speech separation by proposing iFaSNet, which modifies the filter-and-sum network (FaSNet) with an implicit filter-and-sum operation in the latent space and feature-level normalized cross correlation features, resulting in significant performance improvements over FaSNet across all tested conditions.

Various neural network architectures have been proposed in recent years for the task of multi-channel speech separation. Among them, the filter-and-sum network (FaSNet) performs end-to-end time-domain filter-and-sum beamforming and has shown effective in both ad-hoc and fixed microphone array geometries. In this paper, we investigate multiple ways to improve the performance of FaSNet. From the problem formulation perspective, we change the explicit time-domain filter-and-sum operation which involves all the microphones into an implicit filter-and-sum operation in the latent space of only the reference microphone. The filter-and-sum operation is applied on a context around the frame to be separated. This allows the problem formulation to better match the objective of end-to-end separation. From the feature extraction perspective, we modify the calculation of sample-level normalized cross correlation (NCC) features into feature-level NCC (fNCC) features. This makes the model better matches the implicit filter-and-sum formulation. Experiment results on both ad-hoc and fixed microphone array geometries show that the proposed modification to the FaSNet, which we refer to as iFaSNet, is able to significantly outperform the benchmark FaSNet across all conditions with an on par model complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes