SDLGASMay 29, 2025

Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes

arXiv:2505.23619v12 citationsh-index: 27INTERSPEECH
Originality Incremental advance
AI Analysis

This addresses the need for efficient and adaptable detection methods in audio deepfakes, which is crucial for security and media integrity, though it is incremental as it builds on existing embedding models with a novel adaptation approach.

The paper tackles the problem of adapting deepfake detection to unseen text-to-speech models with minimal data, introducing ADD-GP, a few-shot adaptive framework based on a Gaussian Process classifier that achieves strong performance and one-shot adaptability, with results showing competitive accuracy on a new benchmark dataset.

Recent advancements in Text-to-Speech (TTS) models, particularly in voice cloning, have intensified the demand for adaptable and efficient deepfake detection methods. As TTS systems continue to evolve, detection models must be able to efficiently adapt to previously unseen generation models with minimal data. This paper introduces ADD-GP, a few-shot adaptive framework based on a Gaussian Process (GP) classifier for Audio Deepfake Detection (ADD). We show how the combination of a powerful deep embedding model with the Gaussian processes flexibility can achieve strong performance and adaptability. Additionally, we show this approach can also be used for personalized detection, with greater robustness to new TTS models and one-shot adaptability. To support our evaluation, a benchmark dataset is constructed for this task using new state-of-the-art voice cloning models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes