AS LG SD MLDec 14, 2021

Robustifying automatic speech recognition by extracting slowly varying features

Matías Pizarro, Dorothea Kolossa, Asja Fischer

arXiv:2112.07400v34.34 citationsh-index: 32

Originality Incremental advance

AI Analysis

This addresses the problem of adversarial robustness in ASR systems for users relying on speech recognition technologies, though it is an incremental improvement focusing on a specific defense method.

The paper tackles the vulnerability of neural-network-based automatic speech recognition (ASR) systems to adversarial attacks by proposing a defense mechanism that removes fast-changing features from audio signals using slow feature analysis or low-pass filters. The result is a model with similar performance on clean data as the baseline but over four times more robust against targeted adversarial attacks.

In the past few years, it has been shown that deep learning systems are highly vulnerable under attacks with adversarial examples. Neural-network-based automatic speech recognition (ASR) systems are no exception. Targeted and untargeted attacks can modify an audio input signal in such a way that humans still recognise the same words, while ASR systems are steered to predict a different transcription. In this paper, we propose a defense mechanism against targeted adversarial attacks consisting in removing fast-changing features from the audio signals, either by applying slow feature analysis, a low-pass filter, or both, before feeding the input to the ASR system. We perform an empirical analysis of hybrid ASR models trained on data pre-processed in such a way. While the resulting models perform quite well on benign data, they are significantly more robust against targeted adversarial attacks: Our final, proposed model shows a performance on clean data similar to the baseline model, while being more than four times more robust.

View on arXiv PDF

Similar