LGSDASMLNov 20, 2019

CAT: CRF-based ASR Toolkit

arXiv:1911.08747v17 citationsHas Code
Originality Incremental advance
AI Analysis

This toolkit addresses the need for efficient and flexible end-to-end speech recognition tools for researchers and practitioners, though it is incremental as it builds on existing CRF and CTC methods.

The authors introduced CAT, an open-source toolkit for automatic speech recognition that uses a CRF-based framework with CTC-inspired state topology, achieving state-of-the-art results on benchmarks like Switchboard and Aishell with fewer parameters and competitive performance against hybrid models.

In this paper, we present a new open source toolkit for automatic speech recognition (ASR), named CAT (CRF-based ASR Toolkit). A key feature of CAT is discriminative training in the framework of conditional random field (CRF), particularly with connectionist temporal classification (CTC) inspired state topology. CAT contains a full-fledged implementation of CTC-CRF and provides a complete workflow for CRF-based end-to-end speech recognition. Evaluation results on Chinese and English benchmarks such as Switchboard and Aishell show that CAT obtains the state-of-the-art results among existing end-to-end models with less parameters, and is competitive compared with the hybrid DNN-HMM models. Towards flexibility, we show that i-vector based speaker-adapted recognition and latency control mechanism can be explored easily and effectively in CAT. We hope CAT, especially the CRF-based framework and software, will be of broad interest to the community, and can be further explored and improved.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes