CVJan 3, 2024

Synthetic dataset of ID and Travel Document

arXiv:2401.01858v14 citationsh-index: 17
Originality Synthesis-oriented
AI Analysis

This addresses the problem of data scarcity and privacy concerns for researchers in document image analysis, though it is incremental as it builds on existing detection methods.

The authors tackled the lack of public datasets for forged ID document detection by creating SIDTD, a synthetic dataset, and trained state-of-the-art models on it, achieving performance comparable to larger private datasets.

This paper presents a new synthetic dataset of ID and travel documents, called SIDTD. The SIDTD dataset is created to help training and evaluating forged ID documents detection systems. Such a dataset has become a necessity as ID documents contain personal information and a public dataset of real documents can not be released. Moreover, forged documents are scarce, compared to legit ones, and the way they are generated varies from one fraudster to another resulting in a class of high intra-variability. In this paper we trained state-of-the-art models on this dataset and we compare them to the performance achieved in larger, but private, datasets. The creation of this dataset will help to document image analysis community to progress in the task of ID document verification.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes