CLASSep 12, 2025

Prominence-aware automatic speech recognition for conversational speech

arXiv:2509.10116v1h-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses prosody-enhanced ASR for linguistic research and dialogue systems, but it is incremental as it combines existing methods without significant performance gains.

The paper tackled the problem of integrating prosodic prominence detection with automatic speech recognition for conversational Austrian German, achieving 85.53% prominence detection accuracy when word recognition was correct, but without improving overall ASR performance compared to a baseline.

This paper investigates prominence-aware automatic speech recognition (ASR) by combining prominence detection and speech recognition for conversational Austrian German. First, prominence detectors were developed by fine-tuning wav2vec2 models to classify word-level prominence. The detector was then used to automatically annotate prosodic prominence in a large corpus. Based on those annotations, we trained novel prominence-aware ASR systems that simultaneously transcribe words and their prominence levels. The integration of prominence information did not change performance compared to our baseline ASR system, while reaching a prominence detection accuracy of 85.53% for utterances where the recognized word sequence was correct. This paper shows that transformer-based models can effectively encode prosodic information and represents a novel contribution to prosody-enhanced ASR, with potential applications for linguistic research and prosody-informed dialogue systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes