Unsupervised Domain Adaptation by Adversarial Learning for Robust Speech Recognition
This addresses the challenge of robust speech recognition in varied acoustic environments, offering a practical solution for real-world applications, though it is incremental as it builds on existing adversarial learning methods.
The paper tackles the problem of adapting speech recognition models to unseen recording conditions, specifically single microphone far-field speech, using unsupervised adversarial learning, achieving a 19.8% relative WER reduction on Italian data and 12.6% on French data.
In this paper, we investigate the use of adversarial learning for unsupervised adaptation to unseen recording conditions, more specifically, single microphone far-field speech. We adapt neural networks based acoustic models trained with close-talk clean speech to the new recording conditions using untranscribed adaptation data. Our experimental results on Italian SPEECON data set show that our proposed method achieves 19.8% relative word error rate (WER) reduction compared to the unadapted models. Furthermore, this adaptation method is beneficial even when performed on data from another language (i.e. French) giving 12.6% relative WER reduction.