SD ASSep 8, 2021

Beijing ZKJ-NPU Speaker Verification System for VoxCeleb Speaker Recognition Challenge 2021

Li Zhang, Huan Zhao, Qinling Meng, Yanli Chen, Min Liu, Lei Xie

arXiv:2109.03568v210.811 citations

Originality Synthesis-oriented

AI Analysis

This work addresses speaker verification for audio processing applications, but it is incremental as it builds on existing methods for a competition.

The team tackled speaker verification in the VoxCeleb Speaker Recognition Challenge 2021 by exploring neural network structures and introducing ResNet-DTCF, CoAtNet, and PyConv networks, achieving second place with minDCF/EER scores of 0.1205/2.8160% and 0.1175/2.8400% for two tracks.

In this report, we describe the Beijing ZKJ-NPU team submission to the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). We participated in the fully supervised speaker verification track 1 and track 2. In the challenge, we explored various kinds of advanced neural network structures with different pooling layers and objective loss functions. In addition, we introduced the ResNet-DTCF, CoAtNet and PyConv networks to advance the performance of CNN-based speaker embedding model. Moreover, we applied embedding normalization and score normalization at the evaluation stage. By fusing 11 and 14 systems, our final best performances (minDCF/EER) on the evaluation trails are 0.1205/2.8160% and 0.1175/2.8400% respectively for track 1 and 2. With our submission, we came to the second place in the challenge for both tracks.

View on arXiv PDF

Similar