SDJun 22, 2017

On a Novel Speech Representation Using Multitapered Modified Group Delay Function

arXiv:1706.09386v31 citations
Originality Incremental advance
AI Analysis

This is an incremental improvement for speech processing applications, particularly those requiring speaker and speech information.

The paper tackled the problem of speech signal representation by proposing a multitaper modified group delay function-based method, which outperformed an existing multitaper magnitude technique in terms of variance and MSE in spectral- and cepstral-domains, and achieved around 20% improvement in speaker recognition.

In this paper, a novel multitaper modified group delay function-based representation for speech signals is proposed. With a set of phoneme-based experiments, it is shown that the proposed method performs better that an existing multitaper magnitude (MT-MAG) estimation technique, in terms of variance and MSE, both in spectral- and cepstral-domains. In particular, the performance of MT-MOGDF is found to be the best with the Thomson tapers. Additionally, the utility of the MT-MOGDF technique is highlighted in a speaker recognition experimental setup, where an improvement of around $20\%$ compared to the next-best technique is obtained. Moreover, the computational requirements of the proposed technique is comparable to that of MT-MAG. The proposed feature can be used in for many speech-related applications; in particular, it is best suited among those that require information of speaker and speech.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes