Language Dependencies in Adversarial Attacks on Speech Recognition Systems
This work addresses the problem of language-specific vulnerabilities in ASR systems for security researchers, though it is incremental as it extends existing adversarial attack analysis to a new comparative context.
The study compared the vulnerability of English and German automatic speech recognition systems to adversarial attacks, finding statistically significant differences in the computational effort required to generate successful adversarial examples between the two languages.
Automatic speech recognition (ASR) systems are ubiquitously present in our daily devices. They are vulnerable to adversarial attacks, where manipulated input samples fool the ASR system's recognition. While adversarial examples for various English ASR systems have already been analyzed, there exists no inter-language comparative vulnerability analysis. We compare the attackability of a German and an English ASR system, taking Deepspeech as an example. We investigate if one of the language models is more susceptible to manipulations than the other. The results of our experiments suggest statistically significant differences between English and German in terms of computational effort necessary for the successful generation of adversarial examples. This result encourages further research in language-dependent characteristics in the robustness analysis of ASR.