SD AI ASNov 25, 2024

The SVASR System for Text-dependent Speaker Verification (TdSV) AAIC Challenge 2024

arXiv:2411.16276v12.72 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This work addresses the need for high-performance biometric systems, but it is incremental as it combines existing methods for improved accuracy in a specific domain.

The paper tackles text-dependent speaker verification by proposing a pipeline that uses a Fast-Conformer ASR module to filter trials and fuses embeddings from wav2vec-BERT and ReDimNet for speaker representation, achieving a normalized min-DCF of 0.0452 and ranking second in the TDSV 2024 Challenge.

This paper introduces an efficient and accurate pipeline for text-dependent speaker verification (TDSV), designed to address the need for high-performance biometric systems. The proposed system incorporates a Fast-Conformer-based ASR module to validate speech content, filtering out Target-Wrong (TW) and Impostor-Wrong (IW) trials. For speaker verification, we propose a feature fusion approach that combines speaker embeddings extracted from wav2vec-BERT and ReDimNet models to create a unified speaker representation. This system achieves competitive results on the TDSV 2024 Challenge test set, with a normalized min-DCF of 0.0452 (rank 2), highlighting its effectiveness in balancing accuracy and robustness.

View on arXiv PDF

Similar