SDAIASOct 18, 2021

FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection

arXiv:2110.09441v136 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for Mandarin-specific datasets in synthetic speech detection, which is an incremental contribution to forensic audio research.

The authors tackled the problem of detecting synthetic Mandarin speech by constructing the FMFCC-A dataset, which is the largest publicly-available Mandarin dataset with 40,000 synthesized and 10,000 genuine utterances, and they provided baseline analyses and top-performing submissions to demonstrate its utility and challenges.

As increasing development of text-to-speech (TTS) and voice conversion (VC) technologies, the detection of synthetic speech has been suffered dramatically. In order to promote the development of synthetic speech detection model against Mandarin TTS and VC technologies, we have constructed a challenging Mandarin dataset and organized the accompanying audio track of the first fake media forensic challenge of China Society of Image and Graphics (FMFCC-A). The FMFCC-A dataset is by far the largest publicly-available Mandarin dataset for synthetic speech detection, which contains 40,000 synthesized Mandarin utterances that generated by 11 Mandarin TTS systems and two Mandarin VC systems, and 10,000 genuine Mandarin utterances collected from 58 speakers. The FMFCC-A dataset is divided into the training, development and evaluation sets, which are used for the research of detection of synthesized Mandarin speech under various previously unknown speech synthesis systems or audio post-processing operations. In addition to describing the construction of the FMFCC-A dataset, we provide a detailed analysis of two baseline methods and the top-performing submissions from the FMFCC-A, which illustrates the usefulness and challenge of FMFCC-A dataset. We hope that the FMFCC-A dataset can fill the gap of lack of Mandarin datasets for synthetic speech detection.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes