SDLGJun 17, 2015

Deep Denoising Auto-encoder for Statistical Speech Synthesis

arXiv:1506.05268v17 citations
Originality Synthesis-oriented
AI Analysis

This work addresses speech synthesis quality for applications like text-to-speech systems, but it appears incremental as it builds on existing auto-encoder methods.

The paper tackled the problem of extracting better acoustic features for speech synthesis by proposing a deep denoising auto-encoder technique, which increased the quality of synthetic speech in analysis-by-synthesis and text-to-speech experiments.

This paper proposes a deep denoising auto-encoder technique to extract better acoustic features for speech synthesis. The technique allows us to automatically extract low-dimensional features from high dimensional spectral features in a non-linear, data-driven, unsupervised way. We compared the new stochastic feature extractor with conventional mel-cepstral analysis in analysis-by-synthesis and text-to-speech experiments. Our results confirm that the proposed method increases the quality of synthetic speech in both experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes