SDCRLGASNov 10, 2022

Privacy-Utility Balanced Voice De-Identification Using Adversarial Examples

arXiv:2211.05446v12 citationsh-index: 24
Originality Incremental advance
AI Analysis

This addresses privacy concerns for users of voice services by enabling non-intrusive de-identification, though it is incremental as it builds on existing adversarial example methods.

The paper tackles the problem of voice identity leakage in voice data publishing by proposing a system that uses adversarial examples to balance privacy and utility, achieving 98% and 79% successful de-identification on mainstream and commercial systems with a Mel cepstral distortion of 4.31dB and a mean opinion score of 4.48.

Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying convenient voice services. Existing studies employ direct modification or text-based re-synthesis to de-identify users' voices, but resulting in inconsistent audibility in the presence of human participants. In this paper, we propose a voice de-identification system, which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefit from this, our system could preserve user identity from exposure by Automatic Speaker Identification (ASI) while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, our system learns a compact speaker distribution through a conditional variational auto-encoder to sample diverse target embeddings on demand. Combining diverse target generation and input-specific perturbation construction, our system enables any-to-any identify transformation for adaptive de-identification. Experimental results show that our system could achieve 98% and 79% successful de-identification on mainstream ASIs and commercial systems with an objective Mel cepstral distortion of 4.31dB and a subjective mean opinion score of 4.48.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes