SDCLAug 3, 2015

Significance of Maximum Spectral Amplitude in Sub-bands for Spectral Envelope Estimation and Its Application to Statistical Parametric Speech Synthesis

arXiv:1508.00354v1
Originality Incremental advance
AI Analysis

This addresses spectral envelope estimation for speech synthesis, offering a more interpretable alternative to cepstral methods, but it appears incremental as it matches rather than surpasses existing performance.

The paper tackles spectral envelope estimation by proposing a method using maximum spectral amplitude in sub-bands (MSASB), which parametrizes in the spectral domain for better interpretability compared to cepstral methods. Results show it is comparable to STRAIGHT in analysis-by-synthesis and effective in statistical parametric speech synthesis with deep neural networks.

In this paper we propose a technique for spectral envelope estimation using maximum values in the sub-bands of Fourier magnitude spectrum (MSASB). Most other methods in the literature parametrize spectral envelope in cepstral domain such as Mel-generalized cepstrum etc. Such cepstral domain representations, although compact, are not readily interpretable. This difficulty is overcome by our method which parametrizes in the spectral domain itself. In our experiments, spectral envelope estimated using MSASB method was incorporated in the STRAIGHT vocoder. Both objective and subjective results of analysis-by-synthesis indicate that the proposed method is comparable to STRAIGHT. We also evaluate the effectiveness of the proposed parametrization in a statistical parametric speech synthesis framework using deep neural networks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes