SDLGASNov 11, 2022

On the robustness of non-intrusive speech quality model by adversarial examples

arXiv:2211.06508v14 citationsh-index: 14
Originality Synthesis-oriented
AI Analysis

This work addresses the robustness of speech quality models for applications in audio processing and communication, but it is incremental as it applies known adversarial attack and defense techniques to a specific domain.

The paper demonstrates that deep learning-based speech quality predictors are vulnerable to adversarial perturbations, with predictions drastically altered by unnoticeable perturbations as low as -30 dB compared to speech inputs, and it explores adversarial training to improve model robustness.

It has been shown recently that deep learning based models are effective on speech quality prediction and could outperform traditional metrics in various perspectives. Although network models have potential to be a surrogate for complex human hearing perception, they may contain instabilities in predictions. This work shows that deep speech quality predictors can be vulnerable to adversarial perturbations, where the prediction can be changed drastically by unnoticeable perturbations as small as $-30$ dB compared with speech inputs. In addition to exposing the vulnerability of deep speech quality predictors, we further explore and confirm the viability of adversarial training for strengthening robustness of models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes