50.8SDMay 18
MLAAD: The Multi-Language Audio Anti-Spoofing DatasetNicolas M. Müller, Piotr Kawa, Wei Herng Choong et al.
This paper presents the Multi-Language Audio Anti-Spoofing Dataset (MLAAD), version 10: a dataset of synthetic audio to train and evaluate audio deepfake detection models. It features 175 Text-to-Speech (TTS) models, comprising a total of 1002.9 hours of synthetic voice in 54 different languages. To evaluate this dataset, we train three state-of-the-art deepfake detection models with MLAAD and observe that it demonstrates superior performance to comparable datasets like InTheWild and FakeOrReal when used as a training resource. Moreover, compared to the renowned ASVspoof 2019 dataset, MLAAD proves to be a complementary resource. In tests across eight datasets, MLAAD and ASVspoof 2019 alternately outperformed each other, each excelling on four datasets. By publishing the dataset and making a trained model accessible via an interactive webserver, we aim to democratize anti-spoofing technology, making it accessible beyond the realm of specialists, and contributing to global efforts against audio spoofing and deepfakes.
25.2SDMay 21
Eroding Trust in Real Speech: A Large-Scale Study of Human Audio Deepfake PerceptionNicolas M. Müller, Wei Herng Choong
Audio deepfakes have improved rapidly recently, yet their effect on human trust in real speech remains unstudied. We present the largest listening study on audio deepfake perception to date, collecting 35,532 judgments from 1,768 participants across 138 text-to-speech and voice conversion systems. Our central finding is a skepticism shift: compared to a 2021 baseline, human accuracy on fake samples barely changed (72.9% to 71.2%), but accuracy on real samples dropped from 72.7% to 64.1%. Participants are not worse at detecting synthesis artifacts; rather, they increasingly distrust authentic speech. Samples generated by commercial and autoregressive language model systems proved hardest to detect (61.3 - 65.9%), while those from traditional seq2seq and flow-matching models remain easier to spot (75.4 - 76.8%). An ML detector that served as a reference point maintained over 94.5% accuracy across all conditions. Our results suggest that the primary threat posed by modern deepfakes may not be mere deception, but the erosion of trust in genuine audio.
LGAug 21, 2025
GRASPED: Graph Anomaly Detection using Autoencoder with Spectral Encoder and Decoder (Full Version)Wei Herng Choong, Jixing Liu, Ching-Yu Kao et al.
Graph machine learning has been widely explored in various domains, such as community detection, transaction analysis, and recommendation systems. In these applications, anomaly detection plays an important role. Recently, studies have shown that anomalies on graphs induce spectral shifts. Some supervised methods have improved the utilization of such spectral domain information. However, they remain limited by the scarcity of labeled data due to the nature of anomalies. On the other hand, existing unsupervised learning approaches predominantly rely on spatial information or only employ low-pass filters, thereby losing the capacity for multi-band analysis. In this paper, we propose Graph Autoencoder with Spectral Encoder and Spectral Decoder (GRASPED) for node anomaly detection. Our unsupervised learning model features an encoder based on Graph Wavelet Convolution, along with structural and attribute decoders. The Graph Wavelet Convolution-based encoder, combined with a Wiener Graph Deconvolution-based decoder, exhibits bandpass filter characteristics that capture global and local graph information at multiple scales. This design allows for a learning-based reconstruction of node attributes, effectively capturing anomaly information. Extensive experiments on several real-world graph anomaly detection datasets demonstrate that GRASPED outperforms current state-of-the-art models.
CRFeb 27, 2025
DeePen: Penetration Testing for Audio Deepfake DetectionNicolas Müller, Piotr Kawa, Adriana Stan et al.
Deepfakes - manipulated or forged audio and video media - pose significant security risks to individuals, organizations, and society at large. To address these challenges, machine learning-based classifiers are commonly employed to detect deepfake content. In this paper, we assess the robustness of such classifiers through a systematic penetration testing methodology, which we introduce as DeePen. Our approach operates without prior knowledge of or access to the target deepfake detection models. Instead, it leverages a set of carefully selected signal processing modifications - referred to as attacks - to evaluate model vulnerabilities. Using DeePen, we analyze both real-world production systems and publicly available academic model checkpoints, demonstrating that all tested systems exhibit weaknesses and can be reliably deceived by simple manipulations such as time-stretching or echo addition. Furthermore, our findings reveal that while some attacks can be mitigated by retraining detection systems with knowledge of the specific attack, others remain persistently effective. We release all associated code.