SDASSep 8, 2021

Beijing ZKJ-NPU Speaker Verification System for VoxCeleb Speaker Recognition Challenge 2021

arXiv:2109.03568v211 citations
AI Analysis

This work addresses speaker verification for audio processing applications, but it is incremental as it builds on existing methods for a competition.

The team tackled speaker verification in the VoxCeleb Speaker Recognition Challenge 2021 by exploring neural network structures and introducing ResNet-DTCF, CoAtNet, and PyConv networks, achieving second place with minDCF/EER scores of 0.1205/2.8160% and 0.1175/2.8400% for two tracks.

In this report, we describe the Beijing ZKJ-NPU team submission to the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). We participated in the fully supervised speaker verification track 1 and track 2. In the challenge, we explored various kinds of advanced neural network structures with different pooling layers and objective loss functions. In addition, we introduced the ResNet-DTCF, CoAtNet and PyConv networks to advance the performance of CNN-based speaker embedding model. Moreover, we applied embedding normalization and score normalization at the evaluation stage. By fusing 11 and 14 systems, our final best performances (minDCF/EER) on the evaluation trails are 0.1205/2.8160% and 0.1175/2.8400% respectively for track 1 and 2. With our submission, we came to the second place in the challenge for both tracks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes