CLJul 4, 2018

Generating Mandarin and Cantonese F0 Contours with Decision Trees and BLSTMs

arXiv:1807.01682v1

Originality Incremental advance

AI Analysis

This work addresses speech synthesis quality for tonal languages like Mandarin and Cantonese, presenting an incremental improvement over existing methods.

The paper tackled modeling fundamental frequency (F0) contours in Mandarin and Cantonese speech by proposing an Additive-BLSTM model that predicts base and residual contours, achieving better performance in objective measures like RMSE and correlation and subjective listener preference compared to decision tree methods.

This paper models the fundamental frequency contours on both Mandarin and Cantonese speech with decision trees and DNNs (deep neural networks). Different kinds of f0 representations and model architectures are tested for decision trees and DNNs. A new model called Additive-BLSTM (additive bidirectional long short term memory) that predicts a base f0 contour and a residual f0 contour with two BLSTMs is proposed. With respect to objective measures of RMSE and correlation, applying tone-dependent trees together with sample normalization and delta feature regularization within decision tree framework performs best. While the new Additive-BLSTM model with delta feature regularization performs even better. Subjective listening tests on both Mandarin and Cantonese comparing Random Forest model (multiple decision trees) and the Additive-BLSTM model were also held and confirmed the advantage of the new model according to the listeners' preference.

View on arXiv PDF

Similar