SDAIASAug 29, 2025

Generalizable Audio Spoofing Detection using Non-Semantic Representations

arXiv:2509.00186v12 citationsh-index: 18INTERSPEECH
Originality Incremental advance
AI Analysis

This addresses the need for robust countermeasures against deepfake audio attacks, which is critical for securing speech-based services, though it appears incremental by building on existing representation models.

The paper tackles the problem of detecting synthetic audio spoofing attacks by proposing a method using non-semantic audio representations, achieving comparable in-domain performance and significantly outperforming state-of-the-art approaches on out-of-domain test sets.

Rapid advancements in generative modeling have made synthetic audio generation easy, making speech-based services vulnerable to spoofing attacks. Consequently, there is a dire need for robust countermeasures more than ever. Existing solutions for deepfake detection are often criticized for lacking generalizability and fail drastically when applied to real-world data. This study proposes a novel method for generalizable spoofing detection leveraging non-semantic universal audio representations. Extensive experiments have been performed to find suitable non-semantic features using TRILL and TRILLsson models. The results indicate that the proposed method achieves comparable performance on the in-domain test set while significantly outperforming state-of-the-art approaches on out-of-domain test sets. Notably, it demonstrates superior generalization on public-domain data, surpassing methods based on hand-crafted features, semantic embeddings, and end-to-end architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes