SDLGASJun 2, 2023

Improved DeepFake Detection Using Whisper Features

arXiv:2306.01428v171 citationsh-index: 10
Originality Incremental advance
AI Analysis

This addresses the threat of audio DeepFakes for security applications, but it is incremental as it builds on existing detection methods.

The paper tackled the problem of audio DeepFake detection by investigating the use of Whisper automatic speech recognition model as a front-end, showing that it improves detection across models and reduces Equal Error Rate by 21% on the In-The-Wild dataset.

With a recent influx of voice generation methods, the threat introduced by audio DeepFake (DF) is ever-increasing. Several different detection methods have been presented as a countermeasure. Many methods are based on so-called front-ends, which, by transforming the raw audio, emphasize features crucial for assessing the genuineness of the audio sample. Our contribution contains investigating the influence of the state-of-the-art Whisper automatic speech recognition model as a DF detection front-end. We compare various combinations of Whisper and well-established front-ends by training 3 detection models (LCNN, SpecRNet, and MesoNet) on a widely used ASVspoof 2021 DF dataset and later evaluating them on the DF In-The-Wild dataset. We show that using Whisper-based features improves the detection for each model and outperforms recent results on the In-The-Wild dataset by reducing Equal Error Rate by 21%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes