ASCLLGMLOct 28, 2017

Generalized End-to-End Loss for Speaker Verification

arXiv:1710.10467v51050 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving efficiency and accuracy in speaker verification systems, particularly for applications like voice-activated assistants, and is incremental as it builds upon prior tuple-based end-to-end loss methods.

The paper tackles the problem of inefficient training in speaker verification by proposing a new loss function called generalized end-to-end (GE2E) loss, which reduces the equal error rate (EER) by more than 10% and cuts training time by 60% compared to previous methods.

In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function. Unlike TE2E, the GE2E loss function updates the network in a way that emphasizes examples that are difficult to verify at each step of the training process. Additionally, the GE2E loss does not require an initial stage of example selection. With these properties, our model with the new loss function decreases speaker verification EER by more than 10%, while reducing the training time by 60% at the same time. We also introduce the MultiReader technique, which allows us to do domain adaptation - training a more accurate model that supports multiple keywords (i.e. "OK Google" and "Hey Google") as well as multiple dialects.

Code Implementations30 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes