AS LGAug 28, 2025

Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System

Hashim Ali, Surya Subramani, Lekha Bollinani, Nithin Sai Adupa, Sali El-Loh, Hafiz Malik

arXiv:2508.20983v23.32 citationsh-index: 15

Originality Incremental advance

AI Analysis

This work addresses the problem of detecting synthetic speech in varied conditions for security applications, but it is incremental as it builds on existing methods and datasets.

The paper tackled robust audio deepfake detection across multiple tasks by exploring self-supervised learning front-ends and training data strategies, achieving second place in two challenge tasks with strong generalization.

The SAFE Challenge evaluates synthetic speech detection across three tasks: unmodified audio, processed audio with compression artifacts, and laundered audio designed to evade detection. We systematically explore self-supervised learning (SSL) front-ends, training data compositions, and audio length configurations for robust deepfake detection. Our AASIST-based approach incorporates WavLM large frontend with RawBoost augmentation, trained on a multilingual dataset of 256,600 samples spanning 9 languages and over 70 TTS systems from CodecFake, MLAAD v5, SpoofCeleb, Famous Figures, and MAILABS. Through extensive experimentation with different SSL front-ends, three training data versions, and two audio lengths, we achieved second place in both Task 1 (unmodified audio detection) and Task 3 (laundered audio detection), demonstrating strong generalization and robustness.

View on arXiv PDF

Similar