Reducing audio membership inference attack accuracy to chance: 4 defenses
This addresses privacy vulnerabilities for audio-based machine learning systems, but it is incremental as it generalizes existing attacks from images to audio.
The paper tackled the problem of membership inference attacks on audio data for speaker identification, demonstrating high attack precision and recall on datasets like LibriSpeech and VOiCES, and showed that defenses such as prediction obfuscation can reduce attack accuracy to chance.
It is critical to understand the privacy and robustness vulnerabilities of machine learning models, as their implementation expands in scope. In membership inference attacks, adversaries can determine whether a particular set of data was used in training, putting the privacy of the data at risk. Existing work has mostly focused on image related tasks; we generalize this type of attack to speaker identification on audio samples. We demonstrate attack precision of 85.9\% and recall of 90.8\% for LibriSpeech, and 78.3\% precision and 90.7\% recall for VOiCES (Voices Obscured in Complex Environmental Settings). We find that implementing defenses such as prediction obfuscation, defensive distillation or adversarial training, can reduce attack accuracy to chance.