SD CL ASJul 24, 2023

A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization

Edward Fish, Umberto Michieli, Mete Ozay

arXiv:2307.12659v29.58 citationsh-index: 19Has Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient, tailored ASR models for diverse users on resource-constrained devices, representing an incremental advance in quantization techniques.

The paper tackles the problem of deploying large automatic speech recognition (ASR) models on mobile devices by proposing a label-free, personalized mixed-precision quantization method called myQASR, which improves performance for specific genders, languages, and speakers without fine-tuning.

Recent advancement in Automatic Speech Recognition (ASR) has produced large AI models, which become impractical for deployment in mobile devices. Model quantization is effective to produce compressed general-purpose models, however such models may only be deployed to a restricted sub-domain of interest. We show that ASR models can be personalized during quantization while relying on just a small set of unlabelled samples from the target domain. To this end, we propose myQASR, a mixed-precision quantization method that generates tailored quantization schemes for diverse users under any memory requirement with no fine-tuning. myQASR automatically evaluates the quantization sensitivity of network layers by analysing the full-precision activation values. We are then able to generate a personalised mixed-precision quantization scheme for any pre-determined memory budget. Results for large-scale ASR models show how myQASR improves performance for specific genders, languages, and speakers.

View on arXiv PDF Code

Similar