CLDec 18, 2018

wav2letter++: The Fastest Open-source Speech Recognition System

arXiv:1812.07625v1163 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This provides a high-performance tool for researchers and practitioners in speech recognition, enabling faster iteration and model tuning, though it is incremental as it focuses on optimization of existing deep learning methods.

The paper tackles the need for efficient speech recognition systems by introducing wav2letter++, an open-source framework written in C++ that achieves over 2x faster training times compared to other optimized frameworks and scales linearly to 64 GPUs for models with 100 million parameters.

This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. Here we explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2x faster than other optimized frameworks for training end-to-end neural networks for speech recognition. We also show that wav2letter++'s training times scale linearly to 64 GPUs, the highest we tested, for models with 100 million parameters. High-performance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks.

Code Implementations8 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes