Domain-aware Neural Language Models for Speech Recognition
This work provides a strong specific gain in speech recognition accuracy for users of voice assistants across various domains, which is an incremental improvement to existing systems.
This paper addresses the challenge of improving speech recognition accuracy across diverse use-cases by developing a domain-aware rescoring framework. The framework achieves significant improvements, reducing word error rate by up to 2.4% and slot word error rate by up to 4.1% in specific domains like shopping, navigation, and music, without compromising general use case accuracy.
As voice assistants become more ubiquitous, they are increasingly expected to support and perform well on a wide variety of use-cases across different domains. We present a domain-aware rescoring framework suitable for achieving domain-adaptation during second-pass rescoring in production settings. In our framework, we fine-tune a domain-general neural language model on several domains, and use an LSTM-based domain classification model to select the appropriate domain-adapted model to use for second-pass rescoring. This domain-aware rescoring improves the word error rate by up to 2.4% and slot word error rate by up to 4.1% on three individual domains -- shopping, navigation, and music -- compared to domain general rescoring. These improvements are obtained while maintaining accuracy for the general use case.