SD ASNov 28, 2018

UFANS: U-shaped Fully-Parallel Acoustic Neural Structure For Statistical Parametric Speech Synthesis With 20X Faster

Dabiao Ma, Zhiba Su, Yuhao Lu, Wenxuan Wang, Zhen Li

arXiv:1811.12208v12.91 citations

Originality Incremental advance

AI Analysis

This addresses a speed bottleneck in TTS systems for real-time applications, though it is an incremental improvement over existing methods.

The paper tackles the slow sequential computation of autoregressive RNNs in statistical parametric speech synthesis by proposing UFANS, a U-shaped fully-parallel deconvolutional alternative, achieving over 20 times faster training and inference on GPU with comparable speech quality.

Neural networks with Auto-regressive structures, such as Recurrent Neural Networks (RNNs), have become the most appealing structures for acoustic modeling of parametric text to speech synthesis (TTS) in ecent studies. Despite the prominent capacity to capture long-term dependency, these models consist of massive sequential computations that cannot be fully parallel. In this paper, we propose a U-shaped Fully-parallel Acoustic Neural Structure (UFANS), which is a deconvolutional alternative of RNNs for Statistical Parametric Speech Synthesis (SPSS). The experiments verify that our proposed model is over 20 times faster than RNN based acoustic model, both training and inference on GPU with comparable speech quality. Furthermore, We also investigate that how long information dependence really matters to synthesized speech quality.

View on arXiv PDF

Similar