SD LGJun 17, 2015

Deep Denoising Auto-encoder for Statistical Speech Synthesis

Zhenzhou Wu, Shinji Takaki, Junichi Yamagishi

arXiv:1506.05268v12.67 citations

Originality Synthesis-oriented

AI Analysis

This work addresses speech synthesis quality for applications like text-to-speech systems, but it appears incremental as it builds on existing auto-encoder methods.

The paper tackled the problem of extracting better acoustic features for speech synthesis by proposing a deep denoising auto-encoder technique, which increased the quality of synthetic speech in analysis-by-synthesis and text-to-speech experiments.

This paper proposes a deep denoising auto-encoder technique to extract better acoustic features for speech synthesis. The technique allows us to automatically extract low-dimensional features from high dimensional spectral features in a non-linear, data-driven, unsupervised way. We compared the new stochastic feature extractor with conventional mel-cepstral analysis in analysis-by-synthesis and text-to-speech experiments. Our results confirm that the proposed method increases the quality of synthetic speech in both experiments.

View on arXiv PDF

Similar