CVJan 3, 2024

Synthetic dataset of ID and Travel Document

Carlos Boned, Maxime Talarmain, Nabil Ghanmi, Guillaume Chiron, Sanket Biswas, Ahmad Montaser Awal, Oriol Ramos Terrades

arXiv:2401.01858v16.54 citationsh-index: 17Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of data scarcity and privacy concerns for researchers in document image analysis, though it is incremental as it builds on existing detection methods.

The authors tackled the lack of public datasets for forged ID document detection by creating SIDTD, a synthetic dataset, and trained state-of-the-art models on it, achieving performance comparable to larger private datasets.

This paper presents a new synthetic dataset of ID and travel documents, called SIDTD. The SIDTD dataset is created to help training and evaluating forged ID documents detection systems. Such a dataset has become a necessity as ID documents contain personal information and a public dataset of real documents can not be released. Moreover, forged documents are scarce, compared to legit ones, and the way they are generated varies from one fraudster to another resulting in a class of high intra-variability. In this paper we trained state-of-the-art models on this dataset and we compare them to the performance achieved in larger, but private, datasets. The creation of this dataset will help to document image analysis community to progress in the task of ID document verification.

View on arXiv PDF Code

Similar