ASCLDec 14, 2022

Efficient Speech Representation Learning with Low-Bit Quantization

arXiv:2301.00652v110 citationsh-index: 41
Originality Synthesis-oriented
AI Analysis

This work addresses efficiency improvements for speech processing models, which is incremental as it applies existing quantization methods to a specific domain.

The paper tackles the problem of reducing model size and computational complexity in speech representation learning by applying low-bit quantization techniques, achieving up to 86.32% storage reduction and 88% runtime reduction on the SUPERB benchmark, though with increased word error rate in some cases.

With the development of hardware for machine learning, newer models often come at the cost of both increased sizes and computational complexity. In effort to improve the efficiency for these models, we apply and investigate recent quantization techniques on speech representation learning models. The quantization techniques were evaluated on the SUPERB benchmark. On the ASR task, with aggressive quantization to 1 bit, we achieved 86.32% storage reduction (184.42 -> 25.23), 88% estimated runtime reduction (1.00 -> 0.12) with increased word error rate (7.06 -> 15.96). In comparison with DistillHuBERT which also aims for model compression, the 2-bit configuration yielded slightly smaller storage (35.84 vs. 46.98), better word error rate (12.68 vs. 13.37) and more efficient estimated runtime (0.15 vs. 0.73).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes