SDAIASFeb 4, 2021

Audio Adversarial Examples: Attacks Using Vocal Masks

arXiv:2102.02417v2
AI Analysis

This work introduces a new type of adversarial attack for speech recognition systems, highlighting a vulnerability in machine perception of speech compared to human perception.

This paper demonstrates an audio adversarial attack on Speech-To-Text (STT) systems by overlaying a vocal mask generated from the original audio. The attack successfully fools five state-of-the-art STT systems, while human annotators are still able to consistently transcribe the speech.

We construct audio adversarial examples on automatic Speech-To-Text systems . Given any audio waveform, we produce an another by overlaying an audio vocal mask generated from the original audio. We apply our audio adversarial attack to five SOTA STT systems: DeepSpeech, Julius, Kaldi, wav2letter@anywhere and CMUSphinx. In addition, we engaged human annotators to transcribe the adversarial audio. Our experiments show that these adversarial examples fool State-Of-The-Art Speech-To-Text systems, yet humans are able to consistently pick out the speech. The feasibility of this attack introduces a new domain to study machine and human perception of speech.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes