SDHCASNov 29, 2017

Stream Attention for far-field multi-microphone ASR

arXiv:1711.11141v13 citations
Originality Incremental advance
AI Analysis

This work addresses speech recognition challenges in noisy, distant environments for applications like smart devices, though it appears incremental as it builds on existing attention mechanisms.

The paper tackled far-field automatic speech recognition in multi-microphone setups by applying a stream attention framework to DNN posterior probabilities, resulting in substantial improvements in word error rate (WER) as shown in experiments on real recorded data.

A stream attention framework has been applied to the posterior probabilities of the deep neural network (DNN) to improve the far-field automatic speech recognition (ASR) performance in the multi-microphone configuration. The stream attention scheme has been realized through an attention vector, which is derived by predicting the ASR performance from the phoneme posterior distribution of individual microphone stream, focusing the recognizer's attention to more reliable microphones. Investigation on the various ASR performance measures has been carried out using the real recorded dataset. Experiments results show that the proposed framework has yielded substantial improvements in word error rate (WER).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes