ASCLSDApr 11, 2021

A Toolbox for Construction and Analysis of Speech Datasets

arXiv:2104.04896v314 citationsHas Code
AI Analysis

This provides a practical solution for researchers and developers working on speech recognition and synthesis systems, though it is incremental as it builds on existing methods.

The authors tackled the problem of constructing and analyzing speech datasets by introducing a toolbox that includes a construction tool based on prior work and a novel open-source exploration tool, which they applied to create a Russian dataset and analyze existing ones like Multilingual LibriSpeech and Mozilla Common Voice.

Automatic Speech Recognition and Text-to-Speech systems are primarily trained in a supervised fashion and require high-quality, accurately labeled speech datasets. In this work, we examine common problems with speech data and introduce a toolbox for the construction and interactive error analysis of speech datasets. The construction tool is based on Kürzinger et al. work, and, to the best of our knowledge, the dataset exploration tool is the world's first open-source tool of this kind. We demonstrate how to apply these tools to create a Russian speech dataset and analyze existing speech datasets (Multilingual LibriSpeech, Mozilla Common Voice). The tools are open sourced as a part of the NeMo framework.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes