AS SDMar 11

Synthetic Data Domain Adaptation for ASR via LLM-based Text and Phonetic Respelling Augmentation

Natsuo Yamashita, Koichi Nagatsuka, Hiroaki Kokubo, Kota Dohi, Tuan Vu Ho

arXiv:2603.1692033.51 citationsh-index: 9

AI Analysis

This work addresses domain-specific ASR degradation for applications with scarce in-domain resources, representing an incremental improvement through novel augmentation methods.

The paper tackled the problem of domain adaptation for automatic speech recognition (ASR) by proposing a synthetic-data-based framework with LLM-based text augmentation and phonetic respelling augmentation, resulting in consistent reductions in word error rate across four domain-specific datasets.

End-to-end automatic speech recognition often degrades on domain-specific data due to scarce in-domain resources. We propose a synthetic-data-based domain adaptation framework with two contributions: (1) a large language model (LLM)-based text augmentation pipeline with a filtering strategy that balances lexical diversity, perplexity, and domain-term coverage, and (2) phonetic respelling augmentation (PRA), a novel method that introduces pronunciation variability through LLM-generated orthographic pseudo-spellings. Unlike conventional acoustic-level methods such as SpecAugment, PRA provides phonetic diversity before speech synthesis, enabling synthetic speech to better approximate real-world variability. Experimental results across four domain-specific datasets demonstrate consistent reductions in word error rate, confirming that combining domain-specific lexical coverage with realistic pronunciation variation significantly improves ASR robustness.

View on arXiv PDF

Similar