CL LG SDMar 10, 2016

Personalized Speech recognition on mobile devices

Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander Gruenstein, Francoise Beaufays, Carolina Parada

arXiv:1603.03185v219.2198 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of deploying personalized speech recognition on resource-constrained smartphones, offering incremental improvements in efficiency and adaptability.

The paper tackles the problem of enabling accurate, low-latency, and memory-efficient large vocabulary speech recognition on mobile devices, achieving a 13.5% word error rate on dictation tasks with a median speed seven times faster than real-time.

We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its memory footprint using an SVD-based compression scheme. Additionally, we minimize our memory footprint by using a single language model for both dictation and voice command domains, constructed using Bayesian interpolation. Finally, in order to properly handle device-specific information, such as proper names and other context-dependent information, we inject vocabulary items into the decoder graph and bias the language model on-the-fly. Our system achieves 13.5% word error rate on an open-ended dictation task, running with a median speed that is seven times faster than real-time.

View on arXiv PDF

Similar