Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification
This addresses the problem of efficient multilingual speech recognition for edge device users, representing an incremental improvement in model selection and compression.
The paper tackles the challenge of enabling multilingual speech recognition on resource-constrained edge devices by proposing an approach that uses language and accent identification to dynamically select fine-tuned monolingual ASR models, achieving promising results with memory usage reduced to less than 1/12th of other solutions.
Running automatic speech recognition (ASR) on edge devices is non-trivial due to resource constraints, especially in scenarios that require supporting multiple languages. We propose a new approach to enable multilingual speech recognition on edge devices. This approach uses both language identification and accent identification to select one of multiple monolingual ASR models on-the-fly, each fine-tuned for a particular accent. Initial results for both recognition performance and resource usage are promising with our approach using less than 1/12th of the memory consumed by other solutions.