CL AI SD AS SPFeb 15, 2018

Deep Learning Based Speech Beamforming

Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florencio, Mark Hasegawa-Johnson

arXiv:1802.05383v13.236 citations

Originality Incremental advance

AI Analysis

This work addresses speech enhancement for applications like hearing aids or communication systems by offering a hybrid solution that mitigates limitations of existing methods, though it appears incremental in nature.

The paper tackles the challenge of multi-channel speech enhancement with ad-hoc sensors by proposing DEEPBEAM, a framework that combines beamforming and deep learning to produce clean, natural-sounding speech robust to unseen noise, as demonstrated in experiments on synthetic and real-world data.

Multi-channel speech enhancement with ad-hoc sensors has been a challenging task. Speech model guided beamforming algorithms are able to recover natural sounding speech, but the speech models tend to be oversimplified or the inference would otherwise be too complicated. On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels. Also, deep learning approaches introduce a lot of errors, particularly in the presence of unseen noise types and settings. We have therefore proposed an enhancement framework called DEEPBEAM, which combines the two complementary classes of algorithms. DEEPBEAM introduces a beamforming filter to produce natural sounding speech, but the filter coefficients are determined with the help of a monaural speech enhancement neural network. Experiments on synthetic and real-world data show that DEEPBEAM is able to produce clean, dry and natural sounding speech, and is robust against unseen noise.

View on arXiv PDF

Similar