Texture-based Presentation Attack Detection for Automatic Speaker Verification
This work addresses security risks in biometric systems for users and applications, but it is incremental as it builds on existing texture-based detection approaches.
The paper tackles the vulnerability of automatic speaker verification systems to presentation attacks by exploring texture descriptors applied to speech spectrogram images, resulting in a method that rejects at most 16% of bona fide presentations while accepting only 1% of attack presentations.
Biometric systems are nowadays employed across a broad range of applications. They provide high security and efficiency and, in many cases, are user friendly. Despite these and other advantages, biometric systems in general and Automatic speaker verification (ASV) systems in particular can be vulnerable to attack presentations. The most recent ASVSpoof 2019 competition showed that most forms of attacks can be detected reliably with ensemble classifier-based presentation attack detection (PAD) approaches. These, though, depend fundamentally upon the complementarity of systems in the ensemble. With the motivation to increase the generalisability of PAD solutions, this paper reports our exploration of texture descriptors applied to the analysis of speech spectrogram images. In particular, we propose a common fisher vector feature space based on a generative model. Experimental results show the soundness of our approach: at most, 16 in 100 bona fide presentations are rejected whereas only one in 100 attack presentations are accepted.