SDAIASDec 31, 2025

Defense Against Synthetic Speech: Real-Time Detection of RVC Voice Conversion Attacks

arXiv:2601.04227v1h-index: 2
Originality Synthesis-oriented
AI Analysis

It addresses the risk of impersonation, fraud, and misinformation in communication channels like phone and video calls, but is incremental as it applies existing methods to a new dataset with realistic conditions.

This study tackled the problem of detecting AI-generated speech from Retrieval-based Voice Conversion (RVC) attacks in real-time, achieving reliable detection using short-window acoustic features even in noisy backgrounds.

Generative audio technologies now enable highly realistic voice cloning and real-time voice conversion, increasing the risk of impersonation, fraud, and misinformation in communication channels such as phone and video calls. This study investigates real-time detection of AI-generated speech produced using Retrieval-based Voice Conversion (RVC), evaluated on the DEEP-VOICE dataset, which includes authentic and voice-converted speech samples from multiple well-known speakers. To simulate realistic conditions, deepfake generation is applied to isolated vocal components, followed by the reintroduction of background ambiance to suppress trivial artifacts and emphasize conversion-specific cues. We frame detection as a streaming classification task by dividing audio into one-second segments, extracting time-frequency and cepstral features, and training supervised machine learning models to classify each segment as real or voice-converted. The proposed system enables low-latency inference, supporting both segment-level decisions and call-level aggregation. Experimental results show that short-window acoustic features can reliably capture discriminative patterns associated with RVC speech, even in noisy backgrounds. These findings demonstrate the feasibility of practical, real-time deepfake speech detection and underscore the importance of evaluating under realistic audio mixing conditions for robust deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes