CLSDASAug 15, 2023

Improving CTC-AED model with integrated-CTC and auxiliary loss regularization

arXiv:2308.08449v12 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses incremental improvements in automatic speech recognition models for researchers and practitioners.

The paper tackled the problem of improving joint CTC-AED models for automatic speech recognition by proposing integrated-CTC methods and auxiliary loss regularization, achieving better performance with DAL excelling in attention rescoring and PMP in CTC prefix beam search and greedy search.

Connectionist temporal classification (CTC) and attention-based encoder decoder (AED) joint training has been widely applied in automatic speech recognition (ASR). Unlike most hybrid models that separately calculate the CTC and AED losses, our proposed integrated-CTC utilizes the attention mechanism of AED to guide the output of CTC. In this paper, we employ two fusion methods, namely direct addition of logits (DAL) and preserving the maximum probability (PMP). We achieve dimensional consistency by adaptively affine transforming the attention results to match the dimensions of CTC. To accelerate model convergence and improve accuracy, we introduce auxiliary loss regularization for accelerated convergence. Experimental results demonstrate that the DAL method performs better in attention rescoring, while the PMP method excels in CTC prefix beam search and greedy search.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes