ASCLSDJul 13, 2019

BUT VOiCES 2019 System Description

arXiv:1907.06112v11 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement in speaker recognition for a specific challenge, with limited broader impact.

The paper tackled the VOiCES 2019 Speaker Recognition challenge, achieving a 1.0% EER with a fusion of three systems, which is a 15% relative improvement over the single best system.

This is a description of our effort in VOiCES 2019 Speaker Recognition challenge. All systems in the fixed condition are based on the x-vector paradigm with different features and DNN topologies. The single best system reaches 1.2% EER and a fusion of 3 systems yields 1.0% EER, which is 15% relative improvement. The open condition allowed us to use external data which we did for the PLDA adaptation and achieved less than ~10% relative improvement. In the submission to open condition, we used 3 x-vector systems and also one i-vector based system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes