SDApr 21

Audio Spoof Detection with GaborNet

arXiv:2604.1920916.4

AI Analysis

This work offers an incremental improvement in audio spoof detection by replacing sinc functions with Gabor filters in neural network frontends, addressing frequency-domain distortions.

The paper proposes GaborNet, a Gabor filter-based ingestion layer for audio spoof detection, and evaluates it within RawNet2 and RawGAT-ST architectures. It achieves competitive performance on the ASVspoof 2019 LA dataset, with a 0.22% equal error rate (EER) for RawNet2 and 0.18% EER for RawGAT-ST, outperforming SincNet-based baselines.

An direction of development in the extraction of features from audio signals is based on processing raw samples in the time domain. Such an approach appears to be effective, especially in the era of neural networks. An example is SincNet. In this solution, the core of the neural network layer is a set of sinc functions that are convolved with the input signal. Due to the finite length of sinc functions, distortions appear in the frequency domain of the convolved signal, the same as in the case of windowing the signal. Recently, a new approach has been developed that uses Gabor filters to replace sinc functions. Due to the complex results, further modifications had to be applied, such as squared modulus or Gaussian Lowpass Pooling. In this work, an ingestion layer based on a bank of Gabor filters, named GaborNet, and its modifications are intensively examined within the popular RawNet2 and RawGAT- ST architectures. These have been developed for the purpose of audio spoof detection. Another issue that has been investigated was audio augmentation using codec conversions, room responses, and additive noises.

View on arXiv PDF

Similar