Data Quality as Predictor of Voice Anti-Spoofing Generalization
This addresses the issue of unreliable voice anti-spoofing systems for security applications, but it is incremental as it focuses on analyzing existing methods rather than proposing a new solution.
The paper tackled the problem of poor generalization in voice anti-spoofing methods across different corpora by developing an interpretative framework to assess how data quality factors like spectral information and speaker population affect performance, finding that these factors significantly influence generalization.
Voice anti-spoofing aims at classifying a given utterance either as a bonafide human sample, or a spoofing attack (e.g. synthetic or replayed sample). Many anti-spoofing methods have been proposed but most of them fail to generalize across domains (corpora) -- and we do not know \emph{why}. We outline a novel interpretative framework for gauging the impact of data quality upon anti-spoofing performance. Our within- and between-domain experiments pool data from seven public corpora and three anti-spoofing methods based on Gaussian mixture and convolutive neural network models. We assess the impacts of long-term spectral information, speaker population (through x-vector speaker embeddings), signal-to-noise ratio, and selected voice quality features.