LGMay 25, 2021
Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM frameworkNirmalya Sen, Md Sahidullah, Hemant Patil et al.
The performance of speaker recognition system is highly dependent on the amount of speech used in enrollment and test. This work presents a detailed experimental review and analysis of the GMM-SVM based speaker recognition system in presence of duration variability. This article also reports a comparison of the performance of GMM-SVM classifier with its precursor technique Gaussian mixture model-universal background model (GMM-UBM) classifier in presence of duration variability. The goal of this research work is not to propose a new algorithm for improving speaker recognition performance in presence of duration variability. However, the main focus of this work is on utterance partitioning (UP), a commonly used strategy to compensate the duration variability issue. We have analysed in detailed the impact of training utterance partitioning in speaker recognition performance under GMM-SVM framework. We further investigate the reason why the utterance partitioning is important for boosting speaker recognition performance. We have also shown in which case the utterance partitioning could be useful and where not. Our study has revealed that utterance partitioning does not reduce the data imbalance problem of the GMM-SVM classifier as claimed in earlier study. Apart from these, we also discuss issues related to the impact of parameters such as number of Gaussians, supervector length, amount of splitting required for obtaining better performance in short and long duration test conditions from speech duration perspective. We have performed the experiments with telephone speech from POLYCOST corpus consisting of 130 speakers.
LGFeb 18, 2015
F0 Modeling In Hmm-Based Speech Synthesis System Using Deep Belief NetworkSankar Mukherjee, Shyamal Kumar Das Mandal
In recent years multilayer perceptrons (MLPs) with many hid- den layers Deep Neural Network (DNN) has performed sur- prisingly well in many speech tasks, i.e. speech recognition, speaker verification, speech synthesis etc. Although in the context of F0 modeling these techniques has not been ex- ploited properly. In this paper, Deep Belief Network (DBN), a class of DNN family has been employed and applied to model the F0 contour of synthesized speech which was generated by HMM-based speech synthesis system. The experiment was done on Bengali language. Several DBN-DNN architectures ranging from four to seven hidden layers and up to 200 hid- den units per hidden layer was presented and evaluated. The results were compared against clustering tree techniques pop- ularly found in statistical parametric speech synthesis. We show that from textual inputs DBN-DNN learns a high level structure which in turn improves F0 contour in terms of ob- jective and subjective tests.
SDJun 16, 2014
A Bengali HMM Based Speech Synthesis SystemSankar Mukherjee, Shyamal Kumar Das Mandal
The paper presents the capability of an HMM-based TTS system to produce Bengali speech. In this synthesis method, trajectories of speech parameters are generated from the trained Hidden Markov Models. A final speech waveform is synthesized from those speech parameters. In our experiments, spectral properties were represented by Mel Cepstrum Coefficients. Both the training and synthesis issues are investigated in this paper using annotated Bengali speech database. Experimental evaluation depicts that the developed text-to-speech system is capable of producing adequately natural speech in terms of intelligibility and intonation for Bengali.