MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
For researchers and practitioners in audio anti-spoofing, this dataset provides a large-scale, multilingual resource that improves detection performance and complements existing benchmarks.
The paper introduces MLAAD, a large multi-language synthetic audio dataset (1002.9 hours, 54 languages, 175 TTS models) for training audio deepfake detectors. Models trained on MLAAD outperform those trained on InTheWild and FakeOrReal, and are complementary to ASVspoof 2019, each excelling on 4 of 8 test datasets.
This paper presents the Multi-Language Audio Anti-Spoofing Dataset (MLAAD), version 10: a dataset of synthetic audio to train and evaluate audio deepfake detection models. It features 175 Text-to-Speech (TTS) models, comprising a total of 1002.9 hours of synthetic voice in 54 different languages. To evaluate this dataset, we train three state-of-the-art deepfake detection models with MLAAD and observe that it demonstrates superior performance to comparable datasets like InTheWild and FakeOrReal when used as a training resource. Moreover, compared to the renowned ASVspoof 2019 dataset, MLAAD proves to be a complementary resource. In tests across eight datasets, MLAAD and ASVspoof 2019 alternately outperformed each other, each excelling on four datasets. By publishing the dataset and making a trained model accessible via an interactive webserver, we aim to democratize anti-spoofing technology, making it accessible beyond the realm of specialists, and contributing to global efforts against audio spoofing and deepfakes.