CL MMMay 27

TARQ: Tail-Aware Reconstruction Quantization for Rare-Word Robust Automatic Speech Recognition

Xinyu Wang, Ziyu Zhao, Ke Bai, Silin Meng, Dongming Shen, Xiao-Wen Chang, Yixuan HE

arXiv:2605.2780827.9h-index: 3

AI Analysis

For ASR practitioners deploying quantized models, TARQ addresses the mismatch between standard PTQ and tail-sensitive recognition of rare words (names, numerals) without requiring labels or additional training.

TARQ proposes a label-free post-training quantization method that reweights calibration toward rare words, improving rare-word error rate by up to 20% relative without degrading overall WER across 8 ASR backbones and 6 datasets at W4G128.

Data-aware post-training quantization (PTQ) minimizes a per-token reconstruction loss on a small calibration corpus, implicitly weighting positions by their empirical frequency. For \textbf{A}utomatic \textbf{S}peech \textbf{R}ecognition (ASR), this misaligns with tail-sensitive risk: names, numerals, and domain-specific words receive proportionally little calibration mass. We propose \textbf{Tail-Aware Reconstruction Quantization} (\TARQ), a label-free PTQ framework that shifts calibration toward the lexical tail via \textbf{\rareBAL}, a closed-form per-Linear-layer rule equalizing common/tail mass, paired with a metric-consistent residual correction. \TARQ\ requires no entity labels, no curated calibration set, no validation decoding, and no additional training. Across eight ASR backbones and six datasets at W4G128, \TARQ\ improves mean rare-\textbf{W}ord \textbf{E}rror \textbf{R}ate (rare-WER) without an aggregate-WER regression, achieves the lowest cross-corpus rare-WER swing among compared methods, and transfers to entity-rich benchmarks (ProfASR, ContextASR-Speech-En) without entity supervision.

View on arXiv PDF

Similar