Efficient Speech Representation Learning with Low-Bit Quantization
This work addresses efficiency improvements for speech processing models, which is incremental as it applies existing quantization methods to a specific domain.
The paper tackles the problem of reducing model size and computational complexity in speech representation learning by applying low-bit quantization techniques, achieving up to 86.32% storage reduction and 88% runtime reduction on the SUPERB benchmark, though with increased word error rate in some cases.
With the development of hardware for machine learning, newer models often come at the cost of both increased sizes and computational complexity. In effort to improve the efficiency for these models, we apply and investigate recent quantization techniques on speech representation learning models. The quantization techniques were evaluated on the SUPERB benchmark. On the ASR task, with aggressive quantization to 1 bit, we achieved 86.32% storage reduction (184.42 -> 25.23), 88% estimated runtime reduction (1.00 -> 0.12) with increased word error rate (7.06 -> 15.96). In comparison with DistillHuBERT which also aims for model compression, the 2-bit configuration yielded slightly smaller storage (35.84 vs. 46.98), better word error rate (12.68 vs. 13.37) and more efficient estimated runtime (0.15 vs. 0.73).