SDAIASAug 13, 2024

Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation

arXiv:2408.06804v1h-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses speaker identification for security, forensics, and personalized services, but it is incremental as it focuses on tuning and evaluating existing methods on a specific dataset.

The research tackled speaker identification by evaluating six model architectures with hyperparameter tuning on the AB-1 Corpus, achieving performance improvements through linguistic analysis for accent and gender accuracy and bias evaluation.

In the fields of security systems, forensic investigations, and personalized services, the importance of speech as a fundamental human input outweighs text-based interactions. This research delves deeply into the complex field of Speaker Identification (SID), examining its essential components and emphasising Mel Spectrogram and Mel Frequency Cepstral Coefficients (MFCC) for feature extraction. Moreover, this study evaluates six slightly distinct model architectures using extensive analysis to evaluate their performance, with hyperparameter tuning applied to the best-performing model. This work performs a linguistic analysis to verify accent and gender accuracy, in addition to bias evaluation within the AB-1 Corpus dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes