SDASNov 30, 2020

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation

arXiv:2011.15003v434 citations
AI Analysis

This work addresses the problem of improving speech separation performance in multi-channel reverberant environments for speech processing systems.

This paper proposes a novel training objective, Convolutive Transfer Function Invariant Signal-to-Distortion Ratio (CI-SDR) loss, for multi-channel reverberant speech separation. The proposed system achieves a performance approaching that of single-source non-reverberant input, with only a 1.2 percentage point difference, significantly outperforming existing permutation invariant training and Scale Invariant Signal-to-Distortion Ratio methods.

Time-domain training criteria have proven to be very effective for the separation of single-channel non-reverberant speech mixtures. Likewise, mask-based beamforming has shown impressive performance in multi-channel reverberant speech enhancement and source separation. Here, we propose to combine neural network supported multi-channel source separation with a time-domain training objective function. For the objective we propose to use a convolutive transfer function invariant Signal-to-Distortion Ratio (CI-SDR) based loss. While this is a well-known evaluation metric (BSS Eval), it has not been used as a training objective before. To show the effectiveness, we demonstrate the performance on LibriSpeech based reverberant mixtures. On this task, the proposed system approaches the error rate obtained on single-source non-reverberant input, i.e., LibriSpeech test_clean, with a difference of only 1.2 percentage points, thus outperforming a conventional permutation invariant training based system and alternative objectives like Scale Invariant Signal-to-Distortion Ratio by a large margin.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes