SD AI CL ASApr 8, 2021

Half-Truth: A Partially Fake Audio Detection Dataset

Jiangyan Yi, Ye Bai, Jianhua Tao, Haoxin Ma, Zhengkun Tian, Chenglong Wang, Tao Wang, Ruibo Fu

arXiv:2104.03617v226.3127 citations

Originality Synthesis-oriented

AI Analysis

This addresses a critical security threat in audio forensics by providing a dataset for detecting subtle manipulations, though it is incremental as it builds on existing fake audio detection efforts.

The paper tackles the problem of detecting partially fake audio, where small fake clips are hidden in real speech, by developing the HAD dataset. The results show that partially fake audio is much more challenging to detect than fully fake audio, with benchmark tests demonstrating this difficulty.

Diverse promising datasets have been designed to hold back the development of fake audio detection, such as ASVspoof databases. However, previous datasets ignore an attacking situation, in which the hacker hides some small fake clips in real speech audio. This poses a serious threat since that it is difficult to distinguish the small fake clip from the whole speech utterance. Therefore, this paper develops such a dataset for half-truth audio detection (HAD). Partially fake audio in the HAD dataset involves only changing a few words in an utterance.The audio of the words is generated with the very latest state-of-the-art speech synthesis technology. We can not only detect fake uttrances but also localize manipulated regions in a speech using this dataset. Some benchmark results are presented on this dataset. The results show that partially fake audio presents much more challenging than fully fake audio for fake audio detection. The HAD dataset is publicly available: https://zenodo.org/records/10377492.

View on arXiv PDF

Similar