AS LG SDSep 17, 2025

Mixture of Low-Rank Adapter Experts in Generalizable Audio Deepfake Detection

Janne Laakkonen, Ivan Kukanov, Ville Hautamäki

arXiv:2509.13878v12.33 citationsh-index: 10APSIPA

Originality Incremental advance

AI Analysis

This addresses the need for more robust and adaptable detection systems against evolving audio deepfake attacks, though it is incremental as it builds on existing foundation models and LoRA techniques.

The paper tackles the problem of audio deepfake detection models failing to generalize to novel deepfake methods by proposing a mixture-of-LoRA-experts approach, which reduces the average out-of-domain equal error rate from 8.55% to 6.08%.

Foundation models such as Wav2Vec2 excel at representation learning in speech tasks, including audio deepfake detection. However, after being fine-tuned on a fixed set of bonafide and spoofed audio clips, they often fail to generalize to novel deepfake methods not represented in training. To address this, we propose a mixture-of-LoRA-experts approach that integrates multiple low-rank adapters (LoRA) into the model's attention layers. A routing mechanism selectively activates specialized experts, enhancing adaptability to evolving deepfake attacks. Experimental results show that our method outperforms standard fine-tuning in both in-domain and out-of-domain scenarios, reducing equal error rates relative to baseline models. Notably, our best MoE-LoRA model lowers the average out-of-domain EER from 8.55\% to 6.08\%, demonstrating its effectiveness in achieving generalizable audio deepfake detection.

View on arXiv PDF

Similar