SDASOct 9, 2021

Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification

arXiv:2110.04438v225 citations
AI Analysis

This addresses the problem of computational efficiency for speaker verification applications with limited resources, representing an incremental improvement over symmetrical systems.

The paper tackles the challenge of designing a lightweight and robust speaker verification system by proposing an asymmetric structure that uses a large-scale model for enrollment and a small-scale model for verification, reducing the EER to 2.31% on the Voxceleb1 test set without increasing computational costs during verification.

With the development of deep learning, automatic speaker verification has made considerable progress over the past few years. However, to design a lightweight and robust system with limited computational resources is still a challenging problem. Traditionally, a speaker verification system is symmetrical, indicating that the same embedding extraction model is applied for both enrollment and verification in inference. In this paper, we come up with an innovative asymmetric structure, which takes the large-scale ECAPA-TDNN model for enrollment and the small-scale ECAPA-TDNNLite model for verification. As a symmetrical system, our proposed ECAPA-TDNNLite model achieves an EER of 3.07% on the Voxceleb1 original test set with only 11.6M FLOPS. Moreover, the asymmetric structure further reduces the EER to 2.31%, without increasing any computational costs during verification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes