SDASOct 22, 2021

Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network

arXiv:2110.11844v32 citations
Originality Highly original
AI Analysis

This addresses the problem of robust speech enhancement in flexible, real-world audio setups for applications like hearing aids or conferencing, representing a novel method for a known bottleneck.

The paper tackles speech enhancement for ad-hoc microphone arrays with unknown geometries by proposing a triple-path network that uses self-attention for spatial processing and a dual-path network for temporal processing, achieving excellent performance as demonstrated in experiments.

Deep neural networks (DNNs) are very effective for multichannel speech enhancement with fixed array geometries. However, it is not trivial to use DNNs for ad-hoc arrays with unknown order and placement of microphones. We propose a novel triple-path network for ad-hoc array processing in the time domain. The key idea in the network design is to divide the overall processing into spatial processing and temporal processing and use self-attention for spatial processing. Using self-attention for spatial processing makes the network invariant to the order and the number of microphones. The temporal processing is done independently for all channels using a recently proposed dual-path attentive recurrent network. The proposed network is a multiple-input multiple-output architecture that can simultaneously enhance signals at all microphones. Experimental results demonstrate the excellent performance of the proposed approach. Further, we present analysis to demonstrate the effectiveness of the proposed network in utilizing multichannel information even from microphones at far locations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes