ASSDAug 20, 2020

Using Multi-Resolution Feature Maps with Convolutional Neural Networks for Anti-Spoofing in ASV

arXiv:2008.08865v16 citations
Originality Incremental advance
AI Analysis

This work addresses anti-spoofing in automatic speaker verification, offering an incremental improvement for enhancing security in voice-based systems.

The paper tackles the problem of insufficient discriminative representations in anti-spoofing for automatic speaker verification by using multi-resolution feature maps with CNNs, resulting in consistent performance improvements over score fusion across architectures with minimal computational cost increase.

This paper presents a simple but effective method that uses multi-resolution feature maps with convolutional neural networks (CNNs) for anti-spoofing in automatic speaker verification (ASV). The central idea is to alleviate the problem that the feature maps commonly used in anti-spoofing networks are insufficient for building discriminative representations of audio segments, as they are often extracted by a single-length sliding window. Resulting trade-offs between time and frequency resolutions restrict the information in single spectrograms. The proposed method improves both frequency resolution and time resolution by stacking multiple spectrograms that are extracted using different window lengths. These are fed into a convolutional neural network in the form of multiple channels, making it possible to extract more information from input signals while only marginally increasing computational costs. The efficiency of the proposed method has been conformed on the ASVspoof 2019 database. We show that the use of the proposed multiresolution inputs consistently outperforms that of score fusion across different CNN architectures. Moreover, computational cost remains small.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes