On a Novel Speech Representation Using Multitapered Modified Group Delay Function
This is an incremental improvement for speech processing applications, particularly those requiring speaker and speech information.
The paper tackled the problem of speech signal representation by proposing a multitaper modified group delay function-based method, which outperformed an existing multitaper magnitude technique in terms of variance and MSE in spectral- and cepstral-domains, and achieved around 20% improvement in speaker recognition.
In this paper, a novel multitaper modified group delay function-based representation for speech signals is proposed. With a set of phoneme-based experiments, it is shown that the proposed method performs better that an existing multitaper magnitude (MT-MAG) estimation technique, in terms of variance and MSE, both in spectral- and cepstral-domains. In particular, the performance of MT-MOGDF is found to be the best with the Thomson tapers. Additionally, the utility of the MT-MOGDF technique is highlighted in a speaker recognition experimental setup, where an improvement of around $20\%$ compared to the next-best technique is obtained. Moreover, the computational requirements of the proposed technique is comparable to that of MT-MAG. The proposed feature can be used in for many speech-related applications; in particular, it is best suited among those that require information of speaker and speech.