AS CL SDOct 16, 2019

BUT System Description to VoxCeleb Speaker Recognition Challenge 2019

Hossein Zeinali, Shuai Wang, Anna Silnova, Pavel Matějka, Oldřich Plchot

arXiv:1910.12592v128.9289 citations

Originality Synthesis-oriented

AI Analysis

This work addresses speaker recognition for the VoxSRC community, presenting an incremental improvement through system fusion and fine-tuning.

The paper describes the Brno University of Technology's submission to the VoxCeleb Speaker Recognition Challenge 2019, which involved fusing four CNN topologies, including ResNet34 and x-vector-based networks, to achieve error rates of 1.42% and 1.26% on Fixed and Open conditions, respectively.

In this report, we describe the submission of Brno University of Technology (BUT) team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2019. We also provide a brief analysis of different systems on VoxCeleb-1 test sets. Submitted systems for both Fixed and Open conditions are a fusion of 4 Convolutional Neural Network (CNN) topologies. The first and second networks have ResNet34 topology and use two-dimensional CNNs. The last two networks are one-dimensional CNN and are based on the x-vector extraction topology. Some of the networks are fine-tuned using additive margin angular softmax. Kaldi FBanks and Kaldi PLPs were used as features. The difference between Fixed and Open systems lies in the used training data and fusion strategy. The best systems for Fixed and Open conditions achieved 1.42% and 1.26% ERR on the challenge evaluation set respectively.

View on arXiv PDF

Similar