ASCLSDSep 14, 2023

Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

arXiv:2309.07369v22 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses the problem of efficient language model adaptation for deploying speech recognition systems in industry, representing an incremental improvement.

The paper tackled the challenge of text adaptation in attention-based encoder-decoder speech recognition models by proposing a hybrid model that separates acoustic and language models, resulting in a 23% relative Word Error Rate improvement with out-of-domain text data.

The attention-based encoder-decoder (AED) speech recognition model has been widely successful in recent years. However, the joint optimization of acoustic model and language model in end-to-end manner has created challenges for text adaptation. In particular, effective, quick and inexpensive adaptation with text input has become a primary concern for deploying AED systems in the industry. To address this issue, we propose a novel model, the hybrid attention-based encoder-decoder (HAED) speech recognition model that preserves the modularity of conventional hybrid automatic speech recognition systems. Our HAED model separates the acoustic and language models, allowing for the use of conventional text-based language model adaptation techniques. We demonstrate that the proposed HAED model yields 23% relative Word Error Rate (WER) improvements when out-of-domain text data is used for language model adaptation, with only a minor degradation in WER on a general test set compared with the conventional AED model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes