SDASNov 21, 2020

Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification

arXiv:2011.10710v1
AI Analysis

This work provides an incremental improvement for researchers and developers working on text-dependent speaker verification systems with limited training data.

This paper explores using voice conversion for data augmentation in text-dependent speaker verification to address limited training data. The proposed method significantly improved the Equal Error Rate performance from 6.51% to 4.51%.

In this paper, we focus on improving the performance of the text-dependent speaker verification system in the scenario of limited training data. The speaker verification system deep learning based text-dependent generally needs a large scale text-dependent training data set which could be labor and cost expensive, especially for customized new wake-up words. In recent studies, voice conversion systems that can generate high quality synthesized speech of seen and unseen speakers have been proposed. Inspired by those works, we adopt two different voice conversion methods as well as the very simple re-sampling approach to generate new text-dependent speech samples for data augmentation purposes. Experimental results show that the proposed method significantly improves the Equal Error Rare performance from 6.51% to 4.51% in the scenario of limited training data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes