AS CL LGJul 2, 2022

UserLibri: A Dataset for ASR Personalization Using Only Text

Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey

arXiv:2207.00706v16.618 citationsh-index: 33

Originality Incremental advance

AI Analysis

This addresses the challenge of on-device personalization for speech recognition when audio data is scarce, though it is incremental as it builds on existing datasets and methods.

The paper tackles the problem of personalizing speech recognition models on mobile devices by training a personalized language model using only text data, which is more abundant than audio-text pairs. They achieved a reduction in word error rate, including a 2.5 improvement for test-other users in streaming models.

Personalization of speech models on mobile devices (on-device personalization) is an active area of research, but more often than not, mobile devices have more text-only data than paired audio-text data. We explore training a personalized language model on text-only data, used during inference to improve speech recognition performance for that user. We experiment on a user-clustered LibriSpeech corpus, supplemented with personalized text-only data for each user from Project Gutenberg. We release this User-Specific LibriSpeech (UserLibri) dataset to aid future personalization research. LibriSpeech audio-transcript pairs are grouped into 55 users from the test-clean dataset and 52 users from test-other. We are able to lower the average word error rate per user across both sets in streaming and nonstreaming models, including an improvement of 2.5 for the harder set of test-other users when streaming.

View on arXiv PDF

Similar