SDAILGASDec 19, 2025

When De-noising Hurts: A Systematic Study of Speech Enhancement Effects on Modern Medical ASR Systems

arXiv:2512.17562v11 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses a practical problem for practitioners deploying medical scribe systems in noisy clinical environments, showing that traditional noise reduction techniques can be harmful, which is an incremental but important finding.

The study systematically evaluated the effect of speech enhancement preprocessing on modern medical ASR systems, finding that it degrades performance across all tested models and noise conditions, with absolute semWER increases ranging from 1.1% to 46.6%.

Speech enhancement methods are commonly believed to improve the performance of automatic speech recognition (ASR) in noisy environments. However, the effectiveness of these techniques cannot be taken for granted in the case of modern large-scale ASR models trained on diverse, noisy data. We present a systematic evaluation of MetricGAN-plus-voicebank denoising on four state-of-the-art ASR systems: OpenAI Whisper, NVIDIA Parakeet, Google Gemini Flash 2.0, Parrotlet-a using 500 medical speech recordings under nine noise conditions. ASR performance is measured using semantic WER (semWER), a normalized word error rate (WER) metric accounting for domain-specific normalizations. Our results reveal a counterintuitive finding: speech enhancement preprocessing degrades ASR performance across all noise conditions and models. Original noisy audio achieves lower semWER than enhanced audio in all 40 tested configurations (4 models x 10 conditions), with degradations ranging from 1.1% to 46.6% absolute semWER increase. These findings suggest that modern ASR models possess sufficient internal noise robustness and that traditional speech enhancement may remove acoustic features critical for ASR. For practitioners deploying medical scribe systems in noisy clinical environments, our results indicate that preprocessing audio with noise reduction techniques might not just be computationally wasteful but also be potentially harmful to the transcription accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes