AS SDApr 8, 2019

Improved Speaker-Dependent Separation for CHiME-5 Challenge

Jian Wu, Yong Xu, Shi-Xiong Zhang, Lian-Wu Chen, Meng Yu, Lei Xie, Dong Yu

arXiv:1904.03792v15.95 citations

Originality Synthesis-oriented

AI Analysis

This work addresses speech recognition challenges in real-world noisy environments like the CHiME-5 challenge, but it is incremental as it builds on prior submissions.

The paper tackles multi-channel, highly-overlapped conversational speech recognition in noisy dinner party scenarios by improving a speaker-dependent separation system, achieving a 10% absolute WER reduction to 60.15% on the development set.

This paper summarizes several follow-up contributions for improving our submitted NWPU speaker-dependent system for CHiME-5 challenge, which aims to solve the problem of multi-channel, highly-overlapped conversational speech recognition in a dinner party scenario with reverberations and non-stationary noises. We adopt a speaker-aware training method by using i-vector as the target speaker information for multi-talker speech separation. With only one unified separation model for all speakers, we achieve a 10\% absolute improvement in terms of word error rate (WER) over the previous baseline of 80.28\% on the development set by leveraging our newly proposed data processing techniques and beamforming approach. With our improved back-end acoustic model, we further reduce WER to 60.15\% which surpasses the result of our submitted CHiME-5 challenge system without applying any fusion techniques.

View on arXiv PDF

Similar