SDLGMMASMLOct 22, 2019

Cross-Representation Transferability of Adversarial Attacks: From Spectrograms to Audio Waveforms

arXiv:1910.10106v49 citations
Originality Incremental advance
AI Analysis

This reveals a vulnerability in audio AI systems, showing that attacks on one representation (spectrograms) can affect another (waveforms), which is incremental but important for security in audio processing.

The paper demonstrates that adversarial attacks on spectrogram-based audio classifiers can transfer to audio waveforms, reducing the accuracy of a 2D CNN from 81.87% to 12.09% and a 1D CNN from 78.29% to 27.91% on a western music dataset.

This paper shows the susceptibility of spectrogram-based audio classifiers to adversarial attacks and the transferability of such attacks to audio waveforms. Some commonly used adversarial attacks to images have been applied to Mel-frequency and short-time Fourier transform spectrograms, and such perturbed spectrograms are able to fool a 2D convolutional neural network (CNN). Such attacks produce perturbed spectrograms that are visually imperceptible by humans. Furthermore, the audio waveforms reconstructed from the perturbed spectrograms are also able to fool a 1D CNN trained on the original audio. Experimental results on a dataset of western music have shown that the 2D CNN achieves up to 81.87% of mean accuracy on legitimate examples and such performance drops to 12.09% on adversarial examples. Likewise, the 1D CNN achieves up to 78.29% of mean accuracy on original audio samples and such performance drops to 27.91% on adversarial audio waveforms reconstructed from the perturbed spectrograms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes