SDCLASJul 24, 2023

A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization

arXiv:2307.12659v26 citationsh-index: 19
Originality Incremental advance
AI Analysis

This addresses the need for efficient, tailored ASR models for diverse users on resource-constrained devices, representing an incremental advance in quantization techniques.

The paper tackles the problem of deploying large automatic speech recognition (ASR) models on mobile devices by proposing a label-free, personalized mixed-precision quantization method called myQASR, which improves performance for specific genders, languages, and speakers without fine-tuning.

Recent advancement in Automatic Speech Recognition (ASR) has produced large AI models, which become impractical for deployment in mobile devices. Model quantization is effective to produce compressed general-purpose models, however such models may only be deployed to a restricted sub-domain of interest. We show that ASR models can be personalized during quantization while relying on just a small set of unlabelled samples from the target domain. To this end, we propose myQASR, a mixed-precision quantization method that generates tailored quantization schemes for diverse users under any memory requirement with no fine-tuning. myQASR automatically evaluates the quantization sensitivity of network layers by analysing the full-precision activation values. We are then able to generate a personalised mixed-precision quantization scheme for any pre-determined memory budget. Results for large-scale ASR models show how myQASR improves performance for specific genders, languages, and speakers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes