CLSDASMay 18, 2023

A Lexical-aware Non-autoregressive Transformer-based ASR Model

arXiv:2305.10839v1
Originality Incremental advance
AI Analysis

This work addresses the need for faster and more accurate ASR systems, particularly for applications requiring real-time processing, though it is incremental as it builds on existing non-autoregressive methods.

The paper tackles the problem of improving non-autoregressive automatic speech recognition (ASR) by proposing a lexical-aware Transformer-based framework (LA-NAT) that incorporates linguistic knowledge, achieving superior results on datasets like AISHELL-1, CSJ, and TEDLIUM 2, with a model that is 58 times faster than classic autoregressive models.

Non-autoregressive automatic speech recognition (ASR) has become a mainstream of ASR modeling because of its fast decoding speed and satisfactory result. To further boost the performance, relaxing the conditional independence assumption and cascading large-scaled pre-trained models are two active research directions. In addition to these strategies, we propose a lexical-aware non-autoregressive Transformer-based (LA-NAT) ASR framework, which consists of an acoustic encoder, a speech-text shared encoder, and a speech-text shared decoder. The acoustic encoder is used to process the input speech features as usual, and the speech-text shared encoder and decoder are designed to train speech and text data simultaneously. By doing so, LA-NAT aims to make the ASR model aware of lexical information, so the resulting model is expected to achieve better results by leveraging the learned linguistic knowledge. A series of experiments are conducted on the AISHELL-1, CSJ, and TEDLIUM 2 datasets. According to the experiments, the proposed LA-NAT can provide superior results than other recently proposed non-autoregressive ASR models. In addition, LA-NAT is a relatively compact model than most non-autoregressive ASR models, and it is about 58 times faster than the classic autoregressive model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes