ASLGSDOct 25, 2018

Speaker Selective Beamformer with Keyword Mask Estimation

arXiv:1810.10727v23 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of speaker-specific ASR in noisy environments, such as smart speakers, but is incremental as it builds on existing beamforming and mask estimation techniques.

The paper tackles the problem of automatic speech recognition for a target speaker in background speech by using a wakeup keyword to separate and enhance the target's speech, resulting in significantly improved character error rates in Japanese ASR experiments.

This paper addresses the problem of automatic speech recognition (ASR) of a target speaker in background speech. The novelty of our approach is that we focus on a wakeup keyword, which is usually used for activating ASR systems like smart speakers. The proposed method firstly utilizes a DNN-based mask estimator to separate the mixture signal into the keyword signal uttered by the target speaker and the remaining background speech. Then the separated signals are used for calculating a beamforming filter to enhance the subsequent utterances from the target speaker. Experimental evaluations show that the trained DNN-based mask can selectively separate the keyword and background speech from the mixture signal. The effectiveness of the proposed method is also verified with Japanese ASR experiments, and we confirm that the character error rates are significantly improved by the proposed method for both simulated and real recorded test sets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes