Scalable language model adaptation for spoken dialogue systems
This addresses a practical challenge for developers of personal assistant systems in maintaining performance as applications evolve, though it appears incremental as it builds on existing adaptation methods.
The paper tackles the problem of adapting language models for spoken dialogue systems to new application intents without degrading performance on existing ones, achieving up to a 15% improvement in word error rate on new applications even without adaptation data.
Language models (LM) for interactive speech recognition systems are trained on large amounts of data and the model parameters are optimized on past user data. New application intents and interaction types are released for these systems over time, imposing challenges to adapt the LMs since the existing training data is no longer sufficient to model the future user interactions. It is unclear how to adapt LMs to new application intents without degrading the performance on existing applications. In this paper, we propose a solution to (a) estimate n-gram counts directly from the hand-written grammar for training LMs and (b) use constrained optimization to optimize the system parameters for future use cases, while not degrading the performance on past usage. We evaluated our approach on new applications intents for a personal assistant system and find that the adaptation improves the word error rate by up to 15% on new applications even when there is no adaptation data available for an application.