ASCLSDSep 14, 2021

fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit

arXiv:2109.06912v1666 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This toolkit addresses the need for scalable and integrable speech synthesis tools for researchers and developers, but it is incremental as it builds upon the existing fairseq framework.

The paper introduces fairseq S^2, a toolkit for speech synthesis that implements autoregressive and non-autoregressive text-to-speech models with multi-speaker variants, and includes preprocessing tools to reduce data curation needs and automatic metrics for faster development.

This paper presents fairseq S^2, a fairseq extension for speech synthesis. We implement a number of autoregressive (AR) and non-AR text-to-speech models, and their multi-speaker variants. To enable training speech synthesis models with less curated data, a number of preprocessing tools are built and their importance is shown empirically. To facilitate faster iteration of development and analysis, a suite of automatic metrics is included. Apart from the features added specifically for this extension, fairseq S^2 also benefits from the scalability offered by fairseq and can be easily integrated with other state-of-the-art systems provided in this framework. The code, documentation, and pre-trained models are available at https://github.com/pytorch/fairseq/tree/master/examples/speech_synthesis.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes