ASSDMar 26, 2020

In defence of metric learning for speaker recognition

arXiv:2003.11982v2493 citations
AI Analysis

This addresses speaker recognition for unseen speakers, presenting an incremental improvement by challenging a common belief in the field.

The paper tackles open-set speaker recognition by evaluating loss functions on the VoxCeleb dataset, showing that vanilla triplet loss is competitive with classification-based methods and their proposed metric learning objective outperforms state-of-the-art approaches.

The objective of this paper is 'open-set' speaker recognition of unseen speakers, where ideal embeddings should be able to condense information into a compact utterance-level representation that has small intra-speaker and large inter-speaker distance. A popular belief in speaker recognition is that networks trained with classification objectives outperform metric learning methods. In this paper, we present an extensive evaluation of most popular loss functions for speaker recognition on the VoxCeleb dataset. We demonstrate that the vanilla triplet loss shows competitive performance compared to classification-based losses, and those trained with our proposed metric learning objective outperform state-of-the-art methods.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes