Layout-Aware Representation Learning for Open-Set ID Fraud Discovery

Jinxing Li, Nicholas Ren, Cathy Chang, Hongkai Pan, Daniel George

arXiv:2605.0521528.6

Predicted impact top 86% in CV · last 90 daysOriginality Incremental advance

AI Analysis

For fraud detection in identity documents, this work provides a production-aligned approach to discover novel fraud patterns under distribution shift, addressing the limitation of closed-set classification.

The paper tackles open-set ID fraud discovery by learning layout-aware document embeddings that generalize to unseen layouts and surface adaptive fraud campaigns. The method achieves 99.83% layout classification accuracy on Canadian IDs and discovers 276 adaptive fraud cases, including 222 missed by existing detectors.

Identity-document fraud detection is not a stationary binary classification problem. Adaptive attackers modify templates and fabrication pipelines, making historical fraud labels stale, and successful forgeries recur at scale as coherent campaigns. We therefore study layout-aware representation learning for open-set fraud discovery rather than only closed-set classification. We adapt DINOv3 to the document domain via context-aware SimMIM fine-tuning and supervised metric learning with composite loss that encourages inter-class separability and intra-class compactness. The model is trained with U.S. IDs only. With a lightweight MLP and softmax classifier, the embedding achieves 99.83% layout classification accuracy on Canadian layouts. Moreover, on a dataset of 20,448 Canadian IDs, embedding-space analysis surfaces 276 adaptive physical-fraud cases, including 222 not surfaced by incumbent detectors. The embedding supports similarity-based expansion from a single confirmed seed to additional related cases not linked by conventional metadata graphs. The layout-aware document embeddings provide a production-aligned basis for discovering novel and campaign-scale fraud under distribution shift.

View on arXiv PDF

Similar