AS CLJul 13, 2023

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Jari Kolehmainen, Yile Gu, Aditya Gourav, Prashanth Gurunath Shivakumar, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

arXiv:2307.06832v13.36 citationsh-index: 23

Originality Incremental advance

AI Analysis

This work addresses the problem of personalized speech recognition for users of virtual voice assistants, presenting incremental improvements through tailored rescoring methods.

The paper tackled the challenge of recognizing personalized content in speech recognition by exploring three novel approaches—gazetteers, prompting, and a cross-attention encoder-decoder model—for neural rescoring, resulting in over 10% improvement in word error rate on a test set with personalized named entities.

Recognition of personalized content remains a challenge in end-to-end speech recognition. We explore three novel approaches that use personalized content in a neural rescoring step to improve recognition: gazetteers, prompting, and a cross-attention based encoder-decoder model. We use internal de-identified en-US data from interactions with a virtual voice assistant supplemented with personalized named entities to compare these approaches. On a test set with personalized named entities, we show that each of these approaches improves word error rate by over 10%, against a neural rescoring baseline. We also show that on this test set, natural language prompts can improve word error rate by 7% without any training and with a marginal loss in generalization. Overall, gazetteers were found to perform the best with a 10% improvement in word error rate (WER), while also improving WER on a general test set by 1%.

View on arXiv PDF

Similar