SDCRLGASMay 15, 2024

Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer

arXiv:2405.09470v15 citationsh-index: 5SecTL@AsiaCCS
Originality Incremental advance
AI Analysis

This addresses security concerns for ASR systems, which are widely used, but the approach is incremental as it builds on prior adversarial attack methods.

The paper tackles the problem of evaluating the robustness of automatic speech recognition (ASR) systems by proposing an attack method based on audio style transfer, achieving an 82% success rate in attacks while maintaining sound naturalness.

In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while keeping sound naturalness due to our user study.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes