CLAug 2, 2018

Linguistic Search Optimization for Deep Learning Based LVCSR

arXiv:1808.00687v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses efficiency bottlenecks in speech recognition systems, which is an incremental improvement for applications requiring large-scale transcription.

The paper tackles the computational overhead in deep learning-based large vocabulary continuous speech recognition (LVCSR) by proposing general ideas and initial trials to optimize both acoustic model inference and linguistic search stages, aiming to enable wider application of LVCSR.

Recent advances in deep learning based large vocabulary con- tinuous speech recognition (LVCSR) invoke growing demands in large scale speech transcription. The inference process of a speech recognizer is to find a sequence of labels whose corresponding acoustic and language models best match the input feature [1]. The main computation includes two stages: acoustic model (AM) inference and linguistic search (weighted finite-state transducer, WFST). Large computational overheads of both stages hamper the wide application of LVCSR. Benefit from stronger classifiers, deep learning, and more powerful computing devices, we propose general ideas and some initial trials to solve these fundamental problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes