SDASNov 6, 2017

Mandarin tone modeling using recurrent neural networks

arXiv:1711.01946v13 citations
Originality Incremental advance
AI Analysis

This work addresses tone classification in Mandarin speech processing, an incremental improvement for speech recognition systems.

The authors tackled Mandarin tone classification by proposing an Encoder-Classifier framework using recurrent neural networks, which improved classification accuracy by flexibly handling heterogeneous inputs like sequential and segmental data.

We propose an Encoder-Classifier framework to model the Mandarin tones using recurrent neural networks (RNN). In this framework, extracted frames of features for tone classification are fed in to the RNN and casted into a fixed dimensional vector (tone embedding) and then classified into tone types using a softmax layer along with other auxiliary inputs. We investigate various configurations that help to improve the model, including pooling, feature splicing and utilization of syllable-level tone embeddings. Besides, tone embeddings and durations of the contextual syllables are exploited to facilitate tone classification. Experimental results on Mandarin tone classification show the proposed network setups improve tone classification accuracy. The results indicate that the RNN encoder-classifier based tone model flexibly accommodates heterogeneous inputs (sequential and segmental) and hence has the advantages from both the sequential classification tone models and segmental classification tone models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes