CLSDASAug 6, 2025

Pitch Accent Detection improves Pretrained Automatic Speech Recognition

arXiv:2508.04814v1h-index: 2INTERSPEECH
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing ASR performance for speech processing applications by leveraging prosodic cues, though it is incremental as it builds on existing pretrained models.

The authors tackled the problem of improving automatic speech recognition (ASR) by integrating pitch accent detection, resulting in a 41% improvement in F1-score for pitch accent detection and a 28.3% reduction in word error rate (WER) on LibriSpeech under limited resource fine-tuning.

We show the performance of Automatic Speech Recognition (ASR) systems that use semi-supervised speech representations can be boosted by a complimentary pitch accent detection module, by introducing a joint ASR and pitch accent detection model. The pitch accent detection component of our model achieves a significant improvement on the state-of-the-art for the task, closing the gap in F1-score by 41%. Additionally, the ASR performance in joint training decreases WER by 28.3% on LibriSpeech, under limited resource fine-tuning. With these results, we show the importance of extending pretrained speech models to retain or re-learn important prosodic cues such as pitch accent.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes