ASSDJul 21, 2020

Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

arXiv:2007.10723v1
Originality Incremental advance
AI Analysis

This work addresses security concerns in speech recognition systems for applications like voice assistants, but it is incremental as it extends existing adversarial example methods to a hybrid architecture.

The paper tackles the vulnerability of hybrid CTC/attention speech recognition models to audio adversarial examples by proposing algorithms to generate such examples, and demonstrates their use in adversarial training to improve model robustness, with evaluation on the TEDlium v2 task.

Recent advances in Automatic Speech Recognition (ASR) demonstrated how end-to-end systems are able to achieve state-of-the-art performance. There is a trend towards deeper neural networks, however those ASR models are also more complex and prone against specially crafted noisy data. Those Audio Adversarial Examples (AAE) were previously demonstrated on ASR systems that use Connectionist Temporal Classification (CTC), as well as attention-based encoder-decoder architectures. Following the idea of the hybrid CTC/attention ASR system, this work proposes algorithms to generate AAEs to combine both approaches into a joint CTC-attention gradient method. Evaluation is performed using a hybrid CTC/attention end-to-end ASR model on two reference sentences as case study, as well as the TEDlium v2 speech recognition task. We then demonstrate the application of this algorithm for adversarial training to obtain a more robust ASR model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes