AS LG SD MLMay 27, 2020

CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

arXiv:2005.13326v211.324 citationsHas Code

Originality Incremental advance

AI Analysis

This toolkit addresses data efficiency and low latency in speech recognition for researchers and practitioners, offering a modular and simpler alternative to existing methods.

The authors introduced CAT, a CTC-CRF based ASR toolkit that combines data efficiency from hybrid approaches with simplicity from end-to-end methods, achieving state-of-the-art results comparable to fine-tuned Kaldi models and better performance on limited datasets. They also proposed contextualized soft forgetting for streaming ASR without accuracy loss.

In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit). CAT inherits the data-efficiency of the hybrid approach and the simplicity of the E2E approach, providing a full-fledged implementation of CTC-CRFs and complete training and testing scripts for a number of English and Chinese benchmarks. Experiments show CAT obtains state-of-the-art results, which are comparable to the fine-tuned hybrid models in Kaldi but with a much simpler training pipeline. Compared to existing non-modularized E2E models, CAT performs better on limited-scale datasets, demonstrating its data efficiency. Furthermore, we propose a new method called contextualized soft forgetting, which enables CAT to do streaming ASR without accuracy degradation. We hope CAT, especially the CTC-CRF based framework and software, will be of broad interest to the community, and can be further explored and improved.

View on arXiv PDF Code

Similar