ASCLSDMay 20, 2020

A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition

arXiv:2005.09862v231 citations
AI Analysis

This incremental study addresses the problem of reducing transcribed data needs for speech recognition systems, benefiting developers in resource-constrained settings.

The paper investigated three aspects of Masked Predictive Coding for unsupervised pre-training in speech recognition: the impact of pre-training data speaking style, its extension to streaming models, and improved knowledge transfer to downstream tasks. Results included an 8.46% relative error reduction on a streaming model with HKUST data and a 3.99% reduction on AISHELL with enhanced transfer techniques.

Building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this problem, many unsupervised pre-training methods have been proposed. Among these methods, Masked Predictive Coding achieved significant improvements on various speech recognition datasets with BERT-like Masked Reconstruction loss and Transformer backbone. However, many aspects of MPC have not been fully investigated. In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks. Experiments reveled that pre-training data with matching speaking style is more useful on downstream recognition tasks. A unified training objective with APC and MPC provided 8.46% relative error reduction on streaming model trained on HKUST. Also, the combination of target data adaption and layer-wise discriminative training helped the knowledge transfer of MPC, which achieved 3.99% relative error reduction on AISHELL over a strong baseline.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes