ASSDApr 6, 2021

An Initial Investigation for Detecting Partially Spoofed Audio

arXiv:2104.02518v265 citations
AI Analysis

This addresses a practical security gap in audio spoofing detection for applications like voice authentication, but it is incremental as it builds on existing countermeasures with a new dataset.

The paper tackles the problem of detecting partially spoofed audio, where utterances contain both spoofed and genuine segments, and finds that countermeasures trained on fully spoofed data degrade substantially (e.g., reliability drops) when tested on partially spoofed data, while training on partially spoofed data performs reliably for both types.

All existing databases of spoofed speech contain attack data that is spoofed in its entirety. In practice, it is entirely plausible that successful attacks can be mounted with utterances that are only partially spoofed. By definition, partially-spoofed utterances contain a mix of both spoofed and bona fide segments, which will likely degrade the performance of countermeasures trained with entirely spoofed utterances. This hypothesis raises the obvious question: 'Can we detect partially-spoofed audio?' This paper introduces a new database of partially-spoofed data, named PartialSpoof, to help address this question. This new database enables us to investigate and compare the performance of countermeasures on both utterance- and segmental- level labels. Experimental results using the utterance-level labels reveal that the reliability of countermeasures trained to detect fully-spoofed data is found to degrade substantially when tested with partially-spoofed data, whereas training on partially-spoofed data performs reliably in the case of both fully- and partially-spoofed utterances. Additional experiments using segmental-level labels show that spotting injected spoofed segments included in an utterance is a much more challenging task even if the latest countermeasure models are used.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes