CLApr 8, 2019

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

arXiv:1904.03802v220 citations
Originality Incremental advance
AI Analysis

This addresses the lack of code-switch training data for ASR systems, offering a practical solution for multilingual speech recognition in resource-constrained settings, though it is incremental as it builds on existing methods.

The paper tackles the problem of training end-to-end code-switching automatic speech recognition models without code-switched data by using only monolingual data, achieving up to a 4.5% absolute mixed error rate improvement on a Mandarin-English task.

The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English code-switching ASR task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes