CLSep 21, 2021

Learning Domain Specific Language Models for Automatic Speech Recognition through Machine Translation

arXiv:2110.10261v12 citations
Originality Incremental advance
AI Analysis

This work addresses a domain-specific challenge in ASR for multilingual task adaptation, but it is incremental as it builds on existing NMT and LM techniques.

The paper tackles the problem of building automatic speech recognition language models for task-specific scenarios when text data is only available in a different language, using neural machine translation to generate translations and showing that confusion networks from NMT beam search graphs reduce perplexity compared to N-best translations, with experiments on the WMT20 chat translation dataset.

Automatic Speech Recognition (ASR) systems have been gaining popularity in the recent years for their widespread usage in smart phones and speakers. Building ASR systems for task-specific scenarios is subject to the availability of utterances that adhere to the style of the task as well as the language in question. In our work, we target such a scenario wherein task-specific text data is available in a language that is different from the target language in which an ASR Language Model (LM) is expected. We use Neural Machine Translation (NMT) as an intermediate step to first obtain translations of the task-specific text data. We then train LMs on the 1-best and N-best translations and study ways to improve on such a baseline LM. We develop a procedure to derive word confusion networks from NMT beam search graphs and evaluate LMs trained on these confusion networks. With experiments on the WMT20 chat translation task dataset, we demonstrate that NMT confusion networks can help to reduce the perplexity of both n-gram and recurrent neural network LMs compared to those trained only on N-best translations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes