ASLGSDSep 25, 2025

Are Modern Speech Enhancement Systems Vulnerable to Adversarial Attacks?

arXiv:2509.21087v2h-index: 9
Originality Incremental advance
AI Analysis

This addresses a security problem for users of speech enhancement systems, revealing a critical vulnerability in current models.

The paper tackles the vulnerability of modern speech enhancement systems to adversarial attacks, demonstrating that adversarial noise can manipulate enhanced speech to convey different semantic meanings, and finds that diffusion models with stochastic samplers are inherently robust to such attacks.

Machine learning approaches for speech enhancement are becoming increasingly expressive, enabling ever more powerful modifications of input signals. In this paper, we demonstrate that this expressiveness introduces a vulnerability: advanced speech enhancement models can be susceptible to adversarial attacks. Specifically, we show that adversarial noise, carefully crafted and psychoacoustically masked by the original input, can be injected such that the enhanced speech output conveys an entirely different semantic meaning. We experimentally verify that contemporary predictive speech enhancement models can indeed be manipulated in this way. Furthermore, we highlight that diffusion models with stochastic samplers exhibit inherent robustness to such adversarial attacks by design.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes