SDCLASJan 29, 2023

Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

arXiv:2301.12343v111 citationsh-index: 49
Originality Incremental advance
AI Analysis

This work addresses timestamp prediction for end-to-end ASR systems, which is an incremental improvement over existing methods.

The paper tackles the problem of predicting timestamps in non-autoregressive end-to-end ASR models, which lack this ability compared to conventional systems, by optimizing the continuous integrate-and-fire mechanism in Paraformer, resulting in a 66.7% reduction in AAS and 82.1% reduction in DER on a test set.

Conventional ASR systems use frame-level phoneme posterior to conduct force-alignment~(FA) and provide timestamps, while end-to-end ASR systems especially AED based ones are short of such ability. This paper proposes to perform timestamp prediction~(TP) while recognizing by utilizing continuous integrate-and-fire~(CIF) mechanism in non-autoregressive ASR model - Paraformer. Foucing on the fire place bias issue of CIF, we conduct post-processing strategies including fire-delay and silence insertion. Besides, we propose to use scaled-CIF to smooth the weights of CIF output, which is proved beneficial for both ASR and TP task. Accumulated averaging shift~(AAS) and diarization error rate~(DER) are adopted to measure the quality of timestamps and we compare these metrics of proposed system and conventional hybrid force-alignment system. The experiment results over manually-marked timestamps testset show that the proposed optimization methods significantly improve the accuracy of CIF timestamps, reducing 66.7\% and 82.1\% of AAS and DER respectively. Comparing to Kaldi force-alignment trained with the same data, optimized CIF timestamps achieved 12.3\% relative AAS reduction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes