Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition
This work addresses on-device personalized speech recognition for users, but it appears incremental as it builds upon existing contextual adaptation methods.
The paper tackled the problem of improving automatic speech recognition for rare words and personalization by introducing a model-based end-to-end contextual adaptation approach, which outperformed traditional re-scoring by 12% relative WER and 15.7% entity mention F1-score in on-device simulations.
Fast contextual adaptation has shown to be effective in improving Automatic Speech Recognition (ASR) of rare words and when combined with an on-device personalized training, it can yield an even better recognition result. However, the traditional re-scoring approaches based on an external language model is prone to diverge during the personalized training. In this work, we introduce a model-based end-to-end contextual adaptation approach that is decoder-agnostic and amenable to on-device personalization. Our on-device simulation experiments demonstrate that the proposed approach outperforms the traditional re-scoring technique by 12% relative WER and 15.7% entity mention specific F1-score in a continues personalization scenario.