CLDec 18, 2018

wav2letter++: The Fastest Open-source Speech Recognition System

Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, Ronan Collobert

arXiv:1812.07625v17.8163 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a high-performance tool for researchers and practitioners in speech recognition, enabling faster iteration and model tuning, though it is incremental as it focuses on optimization of existing deep learning methods.

The paper tackles the need for efficient speech recognition systems by introducing wav2letter++, an open-source framework written in C++ that achieves over 2x faster training times compared to other optimized frameworks and scales linearly to 64 GPUs for models with 100 million parameters.

This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition. We also show that wav2letter++'s training times scale linearly to 64 GPUs, the highest we tested, for models with 100 million parameters. High-performance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks.

View on arXiv PDF

Similar