SDASAug 31, 2021

Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification

arXiv:2108.13843v146 citations
Originality Incremental advance
AI Analysis

This addresses domain mismatch issues in speaker verification for applications like security or voice assistants, but it is incremental as it builds on existing unsupervised domain adaptation methods.

The paper tackles performance degradation in speaker verification when applied to new domains by proposing a self-supervised learning-based domain adaptation approach (SSDA), which achieves a 10.2% Equal Error Rate on the CnCeleb dataset without using speaker labels.

Large performance degradation is often observed for speaker ver-ification systems when applied to a new domain dataset. Givenan unlabeled target-domain dataset, unsupervised domain adaptation(UDA) methods, which usually leverage adversarial training strate-gies, are commonly used to bridge the performance gap caused bythe domain mismatch. However, such adversarial training strategyonly uses the distribution information of target domain data and cannot ensure the performance improvement on the target domain. Inthis paper, we incorporate self-supervised learning strategy to the un-supervised domain adaptation system and proposed a self-supervisedlearning based domain adaptation approach (SSDA). Compared tothe traditional UDA method, the new SSDA training strategy canfully leverage the potential label information from target domainand adapt the speaker discrimination ability from source domainsimultaneously. We evaluated the proposed approach on the Vox-Celeb (labeled source domain) and CnCeleb (unlabeled target do-main) datasets, and the best SSDA system obtains 10.2% Equal ErrorRate (EER) on the CnCeleb dataset without using any speaker labelson CnCeleb, which also can achieve the state-of-the-art results onthis corpus.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes