SDASMar 24, 2018

MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks

arXiv:1803.09059v115 citations
Originality Incremental advance
AI Analysis

This work addresses speaker verification, particularly for short utterances, with incremental improvements over existing methods.

The paper tackles speaker verification by proposing MTGAN, an enhanced triplet method that combines generative adversarial networks and multitasking optimization to improve embedding encoding, resulting in a 67% relative reduction in equal error rate compared to conventional i-vector methods and a 32% reduction compared to state-of-the-art triplet loss methods on short utterances.

In this paper, we propose an enhanced triplet method that improves the encoding process of embeddings by jointly utilizing generative adversarial mechanism and multitasking optimization. We extend our triplet encoder with Generative Adversarial Networks (GANs) and softmax loss function. GAN is introduced for increasing the generality and diversity of samples, while softmax is for reinforcing features about speakers. For simplification, we term our method Multitasking Triplet Generative Adversarial Networks (MTGAN). Experiment on short utterances demonstrates that MTGAN reduces the verification equal error rate (EER) by 67% (relatively) and 32% (relatively) over conventional i-vector method and state-of-the-art triplet loss method respectively. This effectively indicates that MTGAN outperforms triplet methods in the aspect of expressing the high-level feature of speaker information.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes