ASSDOct 29, 2019

Spoofing Speaker Verification Systems with Deep Multi-speaker Text-to-speech Synthesis

arXiv:1910.13054v11 citations
Originality Incremental advance
AI Analysis

This addresses security vulnerabilities in speaker verification for applications like authentication, but it is incremental as it builds on existing TTS and GAN methods.

The paper tackles the problem of spoofing speaker verification systems by proposing a deep multi-speaker text-to-speech model, achieving high success rates in spoofing state-of-the-art SV systems like i-vectors and Google's GE2E, as well as anti-spoofing systems when their structures are exposed.

This paper proposes a deep multi-speaker text-to-speech (TTS) model for spoofing speaker verification (SV) systems. The proposed model employs one network to synthesize time-downsampled mel-spectrograms from text input and another network to convert them to linear-frequency spectrograms, which are further converted to the time domain using the Griffin-Lim algorithm. Both networks are trained separately under the generative adversarial networks (GAN) framework. Spoofing experiments on two state-of-the-art SV systems (i-vectors and Google's GE2E) show that the proposed system can successfully spoof these systems with a high success rate. Spoofing experiments on anti-spoofing systems (i.e., binary classifiers for discriminating real and synthetic speech) also show a high spoof success rate when such anti-spoofing systems' structures are exposed to the proposed TTS system.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes