ASSDFeb 21, 2022

The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition

arXiv:2202.10017v1
Originality Incremental advance
AI Analysis

This work addresses speech enhancement for noisy, reverberant environments, which is incremental as it builds on existing challenge frameworks.

The paper tackled multi-channel speech enhancement in a reverberant office environment by proposing a two-stage MIMO and MISO convolutional recurrent network, achieving 3rd place in the L3DAS22 challenge with 3.2% WER and 0.972 STOI on the blind test-set.

This paper described the PCG-AIID system for L3DAS22 challenge in Task 1: 3D speech enhancement in office reverberant environment. We proposed a two-stage framework to address multi-channel speech denoising and dereverberation. In the first stage, a multiple input and multiple output (MIMO) network is applied to remove background noise while maintaining the spatial characteristics of multi-channel signals. In the second stage, a multiple input and single output (MISO) network is applied to enhance the speech from desired direction and post-filtering. As a result, our system ranked 3rd place in ICASSP2022 L3DAS22 challenge and significantly outperforms the baseline system, while achieving 3.2% WER and 0.972 STOI on the blind test-set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes