ASSDMay 16, 2020

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge

arXiv:2005.08046v10.0054 citations
AI Analysis15

This addresses the challenge of speaker verification in real-life far-field scenarios for researchers and practitioners, but it is incremental as it builds on existing methods with a new benchmark.

The paper tackles the problem of far-field speaker verification under cross-channel conditions by introducing a challenge with three tasks and a baseline system, achieving minDCFs of 0.62-0.66 and EERs of 6.27-7.18% across tasks.

The INTERSPEECH 2020 Far-Field Speaker Verification Challenge (FFSVC 2020) addresses three different research problems under well-defined conditions: far-field text-dependent speaker verification from single microphone array, far-field text-independent speaker verification from single microphone array, and far-field text-dependent speaker verification from distributed microphone arrays. All three tasks pose a cross-channel challenge to the participants. To simulate the real-life scenario, the enrollment utterances are recorded from close-talk cellphone, while the test utterances are recorded from the far-field microphone arrays. In this paper, we describe the database, the challenge, and the baseline system, which is based on a ResNet-based deep speaker network with cosine similarity scoring. For a given utterance, the speaker embeddings of different channels are equally averaged as the final embedding. The baseline system achieves minDCFs of 0.62, 0.66, and 0.64 and EERs of 6.27%, 6.55%, and 7.18% for task 1, task 2, and task 3, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes