ASAILGSDSep 22, 2023

Memory-augmented conformer for improved end-to-end long-form ASR

arXiv:2309.13029v13 citationsh-index: 23
Originality Incremental advance
AI Analysis

This addresses the issue of handling long-form speech for ASR systems, though it appears incremental as it builds on existing conformer architectures.

The paper tackles the problem of degraded performance in end-to-end automatic speech recognition (ASR) models, especially attention-based ones like conformers, for long utterances by proposing a memory-augmented conformer. The result is that the proposed Conformer-NTM model outperforms the baseline conformer for long utterances, as shown in experiments on Librispeech datasets.

Conformers have recently been proposed as a promising modelling approach for automatic speech recognition (ASR), outperforming recurrent neural network-based approaches and transformers. Nevertheless, in general, the performance of these end-to-end models, especially attention-based models, is particularly degraded in the case of long utterances. To address this limitation, we propose adding a fully-differentiable memory-augmented neural network between the encoder and decoder of a conformer. This external memory can enrich the generalization for longer utterances since it allows the system to store and retrieve more information recurrently. Notably, we explore the neural Turing machine (NTM) that results in our proposed Conformer-NTM model architecture for ASR. Experimental results using Librispeech train-clean-100 and train-960 sets show that the proposed system outperforms the baseline conformer without memory for long utterances.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes