Generalizable speech deepfake detection via meta-learned LoRA
This work addresses the challenge of reliable speech deepfake detection for security applications when spoofing attacks evolve, offering an incremental improvement through parameter-efficient adaptation.
The paper tackled the problem of speech deepfake detection under distribution shifts by framing it as domain generalization, resulting in a model that reduces the average equal error rate from 8.84% to 5.30% while updating only 1.1% of parameters compared to full fine-tuning.
Reliable detection of speech deepfakes (spoofs) must remain effective when the distribution of spoofing attacks shifts. We frame the task as domain generalization and show that inserting Low-Rank Adaptation (LoRA) adapters into every attention head of a self-supervised (SSL) backbone, then training only those adapters with Meta-Learning Domain Generalization (MLDG), yields strong zero-shot performance. The resulting model updates about 3.6 million parameters, roughly 1.1% of the 318 million updated in full fine-tuning, yet surpasses a fully fine-tuned counterpart on five of six evaluation corpora. A first-order MLDG loop encourages the adapters to focus on cues that persist across attack types, lowering the average EER from 8.84% for the fully fine-tuned model to 5.30% with our best MLDG-LoRA configuration. Our findings show that combining meta-learning with parameter-efficient adaptation offers an effective method for zero-shot, distribution-shift-aware speech deepfake detection.