SD LGSep 15, 2025

Improving Out-of-Domain Audio Deepfake Detection via Layer Selection and Fusion of SSL-Based Countermeasures

Pierre Serrano, Raphaël Duroselle, Florian Angulo, Jean-François Bonastre, Olivier Boeffard

arXiv:2509.12003v14.01 citationsh-index: 38

Originality Incremental advance

AI Analysis

This work addresses the challenge of detecting audio deepfakes in varied, unseen scenarios, offering an incremental improvement in efficiency and generalization for security applications.

The paper tackled the problem of poor generalization in audio deepfake detection systems to out-of-domain conditions by analyzing and selecting optimal layers from pre-trained SSL encoders, achieving up to 80% reduction in system parameters while maintaining strong performance.

Audio deepfake detection systems based on frozen pre-trained self-supervised learning (SSL) encoders show a high level of performance when combined with layer-weighted pooling methods, such as multi-head factorized attentive pooling (MHFA). However, they still struggle to generalize to out-of-domain (OOD) conditions. We tackle this problem by studying the behavior of six different pre-trained SSLs, on four different test corpora. We perform a layer-by-layer analysis to determine which layers contribute most. Next, we study the pooling head, comparing a strategy based on a single layer with automatic selection via MHFA. We observed that selecting the best layer gave very good results, while reducing system parameters by up to 80%. A wide variation in performance as a function of test corpus and SSL model is also observed, showing that the pre-training strategy of the encoder plays a role. Finally, score-level fusion of several encoders improved generalization to OOD attacks.

View on arXiv PDF

Similar