SDAIASSep 17, 2023

Enhancing Quantised End-to-End ASR Models via Personalisation

arXiv:2309.09136v14 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses the problem of deploying large ASR models on resource-constrained devices, offering an incremental improvement by enhancing quantized models through personalization.

The paper tackles the performance degradation of quantized end-to-end ASR models on resource-constrained devices by proposing a personalization strategy (PQM) that combines speaker adaptive training with model quantization, achieving 15.1% and 23.3% relative WER reductions on quantized Whisper and Conformer models with a 7x size reduction and 1% additional parameters.

Recent end-to-end automatic speech recognition (ASR) models have become increasingly larger, making them particularly challenging to be deployed on resource-constrained devices. Model quantisation is an effective solution that sometimes causes the word error rate (WER) to increase. In this paper, a novel strategy of personalisation for a quantised model (PQM) is proposed, which combines speaker adaptive training (SAT) with model quantisation to improve the performance of heavily compressed models. Specifically, PQM uses a 4-bit NormalFloat Quantisation (NF4) approach for model quantisation and low-rank adaptation (LoRA) for SAT. Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora. Remarkably, with a 7x reduction in model size and 1% additional speaker-specific parameters, 15.1% and 23.3% relative WER reductions were achieved on quantised Whisper and Conformer-based attention-based encoder-decoder ASR models respectively, comparing to the original full precision models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes