Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release
This addresses privacy concerns for users of smart devices like Amazon Echo and Apple HomePod by providing a formal privacy definition, though it appears incremental as it extends existing differential privacy concepts to speech data.
The authors tackled the problem of protecting speaker identity (voiceprint) in speech data release by proposing a new privacy metric called voice-indistinguishability, based on differential privacy, and developed mechanisms that showed effectiveness and efficiency in experiments on public datasets.
With the development of smart devices, such as the Amazon Echo and Apple's HomePod, speech data have become a new dimension of big data. However, privacy and security concerns may hinder the collection and sharing of real-world speech data, which contain the speaker's identifiable information, i.e., voiceprint, which is considered a type of biometric identifier. Current studies on voiceprint privacy protection do not provide either a meaningful privacy-utility trade-off or a formal and rigorous definition of privacy. In this study, we design a novel and rigorous privacy metric for voiceprint privacy, which is referred to as voice-indistinguishability, by extending differential privacy. We also propose mechanisms and frameworks for privacy-preserving speech data release satisfying voice-indistinguishability. Experiments on public datasets verify the effectiveness and efficiency of the proposed methods.