Physics-Guided Deepfake Detection for Voice Authentication Systems
This addresses security vulnerabilities in voice authentication systems for edge deployments, though it appears incremental as it combines existing techniques like physics features and uncertainty estimation.
The paper tackles the dual threats of deepfake synthesis attacks and control-plane poisoning in voice authentication systems at the network edge, resulting in a framework that fuses physics-guided features with self-supervised learning to achieve robust detection.
Voice authentication systems deployed at the network edge face dual threats: a) sophisticated deepfake synthesis attacks and b) control-plane poisoning in distributed federated learning protocols. We present a framework coupling physics-guided deepfake detection with uncertainty-aware in edge learning. The framework fuses interpretable physics features modeling vocal tract dynamics with representations coming from a self-supervised learning module. The representations are then processed via a Multi-Modal Ensemble Architecture, followed by a Bayesian ensemble providing uncertainty estimates. Incorporating physics-based characteristics evaluations and uncertainty estimates of audio samples allows our proposed framework to remain robust to both advanced deepfake attacks and sophisticated control-plane poisoning, addressing the complete threat model for networked voice authentication.