ASCLCROct 7, 2022

Synthetic Voice Detection and Audio Splicing Detection using SE-Res2Net-Conformer Architecture

arXiv:2210.03581v212 citationsh-index: 101
Originality Incremental advance
AI Analysis

This work addresses spoofing threats in voice authentication and audio integrity, offering incremental improvements in detection methods.

The paper tackles synthetic voice and audio splicing detection by proposing an SE-Res2Net-Conformer architecture for spoofing countermeasures and reformulating splicing detection to focus on boundary identification. Results on the ASVspoof 2019 database show improved performance for logical access scenarios.

Synthetic voice and splicing audio clips have been generated to spoof Internet users and artificial intelligence (AI) technologies such as voice authentication. Existing research work treats spoofing countermeasures as a binary classification problem: bonafide vs. spoof. This paper extends the existing Res2Net by involving the recent Conformer block to further exploit the local patterns on acoustic features. Experimental results on ASVspoof 2019 database show that the proposed SE-Res2Net-Conformer architecture is able to improve the spoofing countermeasures performance for the logical access scenario. In addition, this paper also proposes to re-formulate the existing audio splicing detection problem. Instead of identifying the complete splicing segments, it is more useful to detect the boundaries of the spliced segments. Moreover, a deep learning approach can be used to solve the problem, which is different from the previous signal processing techniques.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes