CV AIOct 16, 2023

GreatSplicing: A Semantically Rich Splicing Dataset

Jiaming Liang, Yuwan Xue, Haowei Liu, Zhenqi Dai, Yu Liao, Rui Wang, Weihao Jiang, Yaping Liu, Zhikun Chen, Guoxiao Liu, Bo Liu, Xiuli Bi

arXiv:2310.10070v31.5h-index: 12

Originality Synthesis-oriented

AI Analysis

This addresses the need for a standardized benchmark dataset in image splicing detection, though it is incremental as it builds on existing data collection efforts.

The authors tackled the problem of insufficient semantic variety in splicing forgery datasets, which causes detection models to overfit, by creating GreatSplicing, a manually curated dataset with 5,000 images across 335 semantic categories, resulting in models achieving low misidentification rates and improved cross-dataset generalization.

In existing splicing forgery datasets, the insufficient semantic variety of spliced regions causes trained detection models to overfit semantic features rather than learn genuine splicing traces. Meanwhile, the lack of a reasonable benchmark dataset has led to inconsistent experimental settings across existing detection methods. To address these issues, we propose GreatSplicing, a manually created, large-scale, high-quality splicing dataset. GreatSplicing comprises 5,000 spliced images and covers spliced regions across 335 distinct semantic categories, enabling detection models to learn splicing traces more effectively. Empirical results show that detection models trained on GreatSplicing achieve low misidentification rates and stronger cross-dataset generalization compared to existing datasets. GreatSplicing is now publicly available for research purposes at the following link.

View on arXiv PDF

Similar