I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification
This work addresses the challenge of unreliable speaker verification for short utterances, which is incremental as it builds on existing i-vector and GAN methods.
The paper tackled the problem of poor performance in i-vector based speaker verification with short utterances by proposing a compensation method using a conditional generative adversarial network (GAN), which reduced the equal error rate by 11.3% on the NIST SRE 2008 dataset.
I-vector based text-independent speaker verification (SV) systems often have poor performance with short utterances, as the biased phonetic distribution in a short utterance makes the extracted i-vector unreliable. This paper proposes an i-vector compensation method using a generative adversarial network (GAN), where its generator network is trained to generate a compensated i-vector from a short-utterance i-vector and its discriminator network is trained to determine whether an i-vector is generated by the generator or the one extracted from a long utterance. Additionally, we assign two other learning tasks to the GAN to stabilize its training and to make the generated ivector more speaker-specific. Speaker verification experiments on the NIST SRE 2008 "10sec-10sec" condition show that our method reduced the equal error rate by 11.3% from the conventional i-vector and PLDA system.