SDLGASApr 17, 2019

Hard Sample Mining for the Improved Retraining of Automatic Speech Recognition

arXiv:1904.08031v110 citations
AI Analysis

This work addresses the challenge of efficiently enhancing ASR performance in target domains, though it appears incremental as it builds on existing retraining and mining techniques.

The paper tackles the problem of improving automatic speech recognition (ASR) systems by retraining with hard samples, which are sparse and costly to label manually, and proposes a mining method that achieves the best performance on an End2End ASR task.

It is an effective way that improves the performance of the existing Automatic Speech Recognition (ASR) systems by retraining with more and more new training data in the target domain. Recently, Deep Neural Network (DNN) has become a successful model in the ASR field. In the training process of the DNN based methods, a back propagation of error between the transcription and the corresponding annotated text is used to update and optimize the parameters. Thus, the parameters are more influenced by the training samples with a big propagation error than the samples with a small one. In this paper, we define the samples with significant error as the hard samples and try to improve the performance of the ASR system by adding many of them. Unfortunately, the hard samples are sparse in the training data of the target domain, and manually label them is expensive. Therefore, we propose a hard samples mining method based on an enhanced deep multiple instance learning, which can find the hard samples from unlabeled training data by using a small subset of the dataset with manual labeling in the target domain. We applied our method to an End2End ASR task and obtained the best performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes