Discrete optimal transport is a strong audio adversarial attack
This addresses a security vulnerability in deployed audio systems, but it is an incremental improvement over existing attack methods.
The paper tackled the problem of attacking audio anti-spoofing countermeasures by using discrete optimal transport as a black-box adversarial method, achieving consistently high equal error rates (EER) across datasets like ASVspoof2019 and ASVspoof5, outperforming conventional attacks.
In this paper, we show that discrete optimal transport (DOT) is an effective black-box adversarial attack against modern audio anti-spoofing countermeasures (CMs). Our attack operates as a post-processing, distribution-alignment step: frame-level WavLM embeddings of generated speech are aligned to an unpaired bona fide pool via entropic OT and a top-$k$ barycentric projection, then decoded with a neural vocoder. Evaluated on ASVspoof2019 and ASVspoof5 with AASIST baselines, DOT yields consistently high equal error rate (EER) across datasets and remains competitive after CM fine-tuning, outperforming several conventional attacks in cross-dataset transfer. Ablation analysis highlights the practical impact of vocoder overlap. Results indicate that distribution-level alignment is a powerful and stable attack surface for deployed CMs.