SDCLASDec 15, 2021

The exploitation of Multiple Feature Extraction Techniques for Speaker Identification in Emotional States under Disguised Voices

arXiv:2112.07940v1
Originality Synthesis-oriented
AI Analysis

This work addresses speaker identification for security or forensic applications, but it is incremental as it compares existing methods on a specific scenario.

The paper tackled speaker identification in disguised voices under emotional states by evaluating five feature extraction methods, finding that concatenated MFCCs, MFCCs-delta, and MFCCs-delta-delta performed best.

Due to improvements in artificial intelligence, speaker identification (SI) technologies have brought a great direction and are now widely used in a variety of sectors. One of the most important components of SI is feature extraction, which has a substantial impact on the SI process and performance. As a result, numerous feature extraction strategies are thoroughly investigated, contrasted, and analyzed. This article exploits five distinct feature extraction methods for speaker identification in disguised voices under emotional environments. To evaluate this work significantly, three effects are used: high-pitched, low-pitched, and Electronic Voice Conversion (EVC). Experimental results reported that the concatenated Mel-Frequency Cepstral Coefficients (MFCCs), MFCCs-delta, and MFCCs-delta-delta is the best feature extraction method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes