ASSDNov 15, 2021

Investigating self-supervised front ends for speech spoofing countermeasures

arXiv:2111.07725v3188 citations
Originality Synthesis-oriented
AI Analysis

This work addresses speech anti-spoofing for security applications, but it is incremental as it applies existing self-supervised models to a known task.

The study tackled the problem of speech spoofing countermeasures by using pre-trained self-supervised speech models as front ends, achieving low equal error rates and outperforming baselines on multiple ASVspoof test sets.

Self-supervised speech model is a rapid progressing research topic, and many pre-trained models have been released and used in various down stream tasks. For speech anti-spoofing, most countermeasures (CMs) use signal processing algorithms to extract acoustic features for classification. In this study, we use pre-trained self-supervised speech models as the front end of spoofing CMs. We investigated different back end architectures to be combined with the self-supervised front end, the effectiveness of fine-tuning the front end, and the performance of using different pre-trained self-supervised models. Our findings showed that, when a good pre-trained front end was fine-tuned with either a shallow or a deep neural network-based back end on the ASVspoof 2019 logical access (LA) training set, the resulting CM not only achieved a low EER score on the 2019 LA test set but also significantly outperformed the baseline on the ASVspoof 2015, 2021 LA, and 2021 deepfake test sets. A sub-band analysis further demonstrated that the CM mainly used the information in a specific frequency band to discriminate the bona fide and spoofed trials across the test sets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes