SDASNov 28, 2018

UFANS: U-shaped Fully-Parallel Acoustic Neural Structure For Statistical Parametric Speech Synthesis With 20X Faster

arXiv:1811.12208v11 citations
Originality Incremental advance
AI Analysis

This addresses a speed bottleneck in TTS systems for real-time applications, though it is an incremental improvement over existing methods.

The paper tackles the slow sequential computation of autoregressive RNNs in statistical parametric speech synthesis by proposing UFANS, a U-shaped fully-parallel deconvolutional alternative, achieving over 20 times faster training and inference on GPU with comparable speech quality.

Neural networks with Auto-regressive structures, such as Recurrent Neural Networks (RNNs), have become the most appealing structures for acoustic modeling of parametric text to speech synthesis (TTS) in ecent studies. Despite the prominent capacity to capture long-term dependency, these models consist of massive sequential computations that cannot be fully parallel. In this paper, we propose a U-shaped Fully-parallel Acoustic Neural Structure (UFANS), which is a deconvolutional alternative of RNNs for Statistical Parametric Speech Synthesis (SPSS). The experiments verify that our proposed model is over 20 times faster than RNN based acoustic model, both training and inference on GPU with comparable speech quality. Furthermore, We also investigate that how long information dependence really matters to synthesized speech quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes