CLSDASDec 6, 2023

Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation

arXiv:2312.03312v15 citationsh-index: 5ASRU
Originality Incremental advance
AI Analysis

This work addresses speech recognition challenges for low-resource languages, representing an incremental improvement in two-pass ASR systems.

This research tackled the problem of improving speech recognition in low-resource languages by optimizing two-pass cross-lingual transfer learning, specifically enhancing phoneme recognition and phoneme-to-grapheme translation models, resulting in significant reductions in Word Error Rate (WER) on the CommonVoice 12.0 dataset.

This research optimizes two-pass cross-lingual transfer learning in low-resource languages by enhancing phoneme recognition and phoneme-to-grapheme translation models. Our approach optimizes these two stages to improve speech recognition across languages. We optimize phoneme vocabulary coverage by merging phonemes based on shared articulatory characteristics, thus improving recognition accuracy. Additionally, we introduce a global phoneme noise generator for realistic ASR noise during phoneme-to-grapheme training to reduce error propagation. Experiments on the CommonVoice 12.0 dataset show significant reductions in Word Error Rate (WER) for low-resource languages, highlighting the effectiveness of our approach. This research contributes to the advancements of two-pass ASR systems in low-resource languages, offering the potential for improved cross-lingual transfer learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes