SegReConcat: A Data Augmentation Method for Voice Anonymization Attack
This addresses privacy risks in voice anonymization for users of speech data, but it is incremental as it builds on existing attacker-side enhancement methods.
The paper tackles the problem of residual speaker cues in anonymized voice data by proposing SegReConcat, a data augmentation method that segments and rearranges speech to disrupt contextual cues, resulting in improved de-anonymization on five out of seven anonymization systems in the VoicePrivacy Attacker Challenge 2024.
Anonymization of voice seeks to conceal the identity of the speaker while maintaining the utility of speech data. However, residual speaker cues often persist, which pose privacy risks. We propose SegReConcat, a data augmentation method for attacker-side enhancement of automatic speaker verification systems. SegReConcat segments anonymized speech at the word level, rearranges segments using random or similarity-based strategies to disrupt long-term contextual cues, and concatenates them with the original utterance, allowing an attacker to learn source speaker traits from multiple perspectives. The proposed method has been evaluated in the VoicePrivacy Attacker Challenge 2024 framework across seven anonymization systems, SegReConcat improves de-anonymization on five out of seven systems.