SD AIMay 7

Quantum Kernels for Audio Deepfake Detection Using Spectrogram Patch Features

Lisan Al Amin, Rakib Hossain, Mahbubul Islam, Faisal Quader, Thanh Thi Nguyen

arXiv:2605.0603538.8

Predicted impact top 68% in SD · last 90 daysOriginality Incremental advance

AI Analysis

For audio deepfake detection, this work provides a practical quantum kernel framework that leverages time-frequency structure, though the improvement over classical baselines is modest and the approach is domain-specific.

The paper proposes Q-Patch, a quantum feature map for audio that encodes local time-frequency patches from mel-spectrograms into quantum states. On an audio spoofing detection task, Q-Patch achieves an AUROC of 0.87, outperforming an RBF-SVM baseline (0.82) on the same features.

Quantum machine learning has emerged as a promising tool for pattern recognition, yet many audio-focused approaches still treat spectrograms as generic images and do not explicitly exploit their time-frequency structure. We propose Q-Patch, a quantum feature map tailored to audio that encodes local time-frequency patches from mel-spectrograms into quantum states using shallow, hardware-efficient circuits with adjacency-aware entanglement. Each selected patch is summarized by a compact four-dimensional acoustic descriptor and mapped to a four-qubit circuit with depth at most three, enabling practical quantum kernel construction under near-term constraints. We evaluate Q-Patch on an audio spoofing detection task using a controlled, balanced protocol and compare it with size-matched classical baselines. Q-Patch improves discrimination between bona fide and spoofed samples, achieving an area under the receiver operating characteristic curve (AUROC) of 0.87, compared with 0.82 for a radial basis function support vector machine (RBF-SVM) trained on the same patch-level features. Kernel-space analysis further reveals a clear class structure, with cross-class similarity around 0.615 and within-class self-similarity of 1.00. Overall, Q-Patch provides a practical framework for incorporating time-frequency-aware representations into quantum kernel learning for audio authenticity assessment in low-resource settings.

View on arXiv PDF

Similar