ASCLJan 1, 2025

Automatic Text Pronunciation Correlation Generation and Application for Contextual Biasing

arXiv:2501.00804v1h-index: 8ICASSP
Originality Incremental advance
AI Analysis

This work addresses the challenge of distinguishing pronunciation correlations for speech recognition, particularly benefiting languages or dialects lacking manual pronunciation lexicons, though it appears incremental as it builds on existing alignment and encoding techniques.

The paper tackles the problem of automatically generating pronunciation correlations between written texts, traditionally done manually, by proposing a data-driven method called ATPC that uses speech and text annotations. Experimental results on Mandarin show that ATPC improves end-to-end automatic speech recognition performance in contextual biasing.

Effectively distinguishing the pronunciation correlations between different written texts is a significant issue in linguistic acoustics. Traditionally, such pronunciation correlations are obtained through manually designed pronunciation lexicons. In this paper, we propose a data-driven method to automatically acquire these pronunciation correlations, called automatic text pronunciation correlation (ATPC). The supervision required for this method is consistent with the supervision needed for training end-to-end automatic speech recognition (E2E-ASR) systems, i.e., speech and corresponding text annotations. First, the iteratively-trained timestamp estimator (ITSE) algorithm is employed to align the speech with their corresponding annotated text symbols. Then, a speech encoder is used to convert the speech into speech embeddings. Finally, we compare the speech embeddings distances of different text symbols to obtain ATPC. Experimental results on Mandarin show that ATPC enhances E2E-ASR performance in contextual biasing and holds promise for dialects or languages lacking artificial pronunciation lexicons.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes