Protecting gender and identity with disentangled speech representations
This addresses privacy concerns for individuals in speech processing applications, but it is incremental as it builds on existing disentangled representation learning methods.
The paper tackled the problem of protecting sensitive biometric information in speech by learning privacy-preserving representations, showing that encoding gender information is more effective than speaker-identity alone, with experiments on LibriSpeech reducing gender recognition and speaker verification to random guess accuracy.
Besides its linguistic content, our speech is rich in biometric information that can be inferred by classifiers. Learning privacy-preserving representations for speech signals enables downstream tasks without sharing unnecessary, private information about an individual. In this paper, we show that protecting gender information in speech is more effective than modelling speaker-identity information only when generating a non-sensitive representation of speech. Our method relies on reconstructing speech by decoding linguistic content along with gender information using a variational autoencoder. Specifically, we exploit disentangled representation learning to encode information about different attributes into separate subspaces that can be factorised independently. We present a novel way to encode gender information and disentangle two sensitive biometric identifiers, namely gender and identity, in a privacy-protecting setting. Experiments on the LibriSpeech dataset show that gender recognition and speaker verification can be reduced to a random guess, protecting against classification-based attacks.