ASSDJun 16, 2020

Quantization of Acoustic Model Parameters in Automatic Speech Recognition Framework

arXiv:2006.09054v24 citations
Originality Synthesis-oriented
AI Analysis

This work addresses parameter reduction for embedded devices in automatic speech recognition, but it is incremental as it applies existing quantization methods to a standard setup.

This paper tackles the problem of reducing acoustic model parameters for embedded speech recognition by studying quantization's impact on word recognition accuracy, finding that different quantization schemes yield varying results on the Librispeech dataset.

State-of-the-art hybrid automatic speech recognition (ASR) system exploits deep neural network (DNN) based acoustic models (AM) trained with Lattice Free-Maximum Mutual Information (LF-MMI) criterion and n-gram language models. The AMs typically have millions of parameters and require significant parameter reduction to operate on embedded devices. The impact of parameter quantization on the overall word recognition performance is studied in this paper. Following approaches are presented: (i) AM trained in Kaldi framework with conventional factorized TDNN (TDNN-F) architecture, (ii) the TDNN AM built in Kaldi loaded into the PyTorch toolkit using a C++ wrapper for post-training quantization, (iii) quantization-aware training in PyTorch for Kaldi TDNN model, (iv) quantization-aware training in Kaldi. Results obtained on standard Librispeech setup provide an interesting overview of recognition accuracy w.r.t. applied quantization scheme.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes