SDAICLMMASSep 9, 2022

DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion

arXiv:2209.04530v18 citationsh-index: 16Has Code
Originality Incremental advance
AI Analysis

This addresses privacy and security concerns for users of speech-based online services by preventing speaker impersonation, though it appears incremental as it builds on existing voice conversion methods.

The paper tackled the problem of speaker de-identification to enhance privacy in speech-based services by proposing DeID-VC, a system that converts real speakers to pseudo speakers, resulting in a 10% lower word error rate and 5% higher equal error rate compared to the baseline.

The widespread adoption of speech-based online services raises security and privacy concerns regarding the data that they use and share. If the data were compromised, attackers could exploit user speech to bypass speaker verification systems or even impersonate users. To mitigate this, we propose DeID-VC, a speaker de-identification system that converts a real speaker to pseudo speakers, thus removing or obfuscating the speaker-dependent attributes from a spoken voice. The key components of DeID-VC include a Variational Autoencoder (VAE) based Pseudo Speaker Generator (PSG) and a voice conversion Autoencoder (AE) under zero-shot settings. With the help of PSG, DeID-VC can assign unique pseudo speakers at speaker level or even at utterance level. Also, two novel learning objectives are added to bridge the gap between training and inference of zero-shot voice conversion. We present our experimental results with word error rate (WER) and equal error rate (EER), along with three subjective metrics to evaluate the generated output of DeID-VC. The result shows that our method substantially improved intelligibility (WER 10% lower) and de-identification effectiveness (EER 5% higher) compared to our baseline. Code and listening demo: https://github.com/a43992899/DeID-VC

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes