LGSDSep 27, 2015

End-to-End Text-Dependent Speaker Verification

arXiv:1509.08062v1614 citations
AI Analysis

This work addresses the need for efficient and accurate speaker verification systems in applications like voice assistants, though it appears incremental as it builds on existing neural network approaches.

The paper tackles text-dependent speaker verification by introducing an end-to-end neural network that directly maps test and reference utterances to a verification score, optimizing all components jointly. It achieves effectiveness on an internal 'Ok Google' benchmark, targeting big data applications with high accuracy and small footprint.

In this paper we present a data-driven, integrated approach to speaker verification, which maps a test utterance and a few reference utterances directly to a single score for verification and jointly optimizes the system's components using the same evaluation protocol and metric as at test time. Such an approach will result in simple and efficient systems, requiring little domain-specific knowledge and making few model assumptions. We implement the idea by formulating the problem as a single neural network architecture, including the estimation of a speaker model on only a few utterances, and evaluate it on our internal "Ok Google" benchmark for text-dependent speaker verification. The proposed approach appears to be very effective for big data applications like ours that require highly accurate, easy-to-maintain systems with a small footprint.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes