Rajesh M. Hegde

2papers

2 Papers

SDOct 19, 2016
A Bayesian Approach to Estimation of Speaker Normalization Parameters

Dhananjay Ram, Debasis Kundu, Rajesh M. Hegde

In this work, a Bayesian approach to speaker normalization is proposed to compensate for the degradation in performance of a speaker independent speech recognition system. The speaker normalization method proposed herein uses the technique of vocal tract length normalization (VTLN). The VTLN parameters are estimated using a novel Bayesian approach which utilizes the Gibbs sampler, a special type of Markov Chain Monte Carlo method. Additionally the hyperparameters are estimated using maximum likelihood approach. This model is used assuming that human vocal tract can be modeled as a tube of uniform cross section. It captures the variation in length of the vocal tract of different speakers more effectively, than the linear model used in literature. The work has also investigated different methods like minimization of Mean Square Error (MSE) and Mean Absolute Error (MAE) for the estimation of VTLN parameters. Both single pass and two pass approaches are then used to build a VTLN based speech recognizer. Experimental results on recognition of vowels and Hindi phrases from a medium vocabulary indicate that the Bayesian method improves the performance by a considerable margin.

SDNov 25, 2014
A Complex Matrix Factorization approach to Joint Modeling of Magnitude and Phase for Source Separation

Chaitanya Ahuja, Karan Nathwani, Rajesh M. Hegde

Conventional NMF methods for source separation factorize the matrix of spectral magnitudes. Spectral Phase is not included in the decomposition process of these methods. However, phase of the speech mixture is generally used in reconstructing the target speech signal. This results in undesired traces of interfering sources in the target signal. In this paper the spectral phase is incorporated in the decomposition process itself. Additionally, the complex matrix factorization problem is reduced to an NMF problem using simple transformations. This results in effective separation of speech mixtures since both magnitude and phase are utilized jointly in the separation process. Improvement in source separation results are demonstrated using objective quality evaluations on the GRID corpus.