LGDBAug 27, 2025

Robust Detection of Synthetic Tabular Data under Schema Variability

arXiv:2509.00092v1h-index: 19
Originality Incremental advance
AI Analysis

This addresses the underexplored challenge of data authenticity for tabular data, which is critical for applications relying on real-world tables, though it is incremental as it builds on a prior baseline.

The paper tackles the problem of detecting synthetic tabular data under variable and unseen schemas, introducing a novel transformer architecture that improves AUC and accuracy by 7 points and gains an additional 7 accuracy points with table adaptation.

The rise of powerful generative models has sparked concerns over data authenticity. While detection methods have been extensively developed for images and text, the case of tabular data, despite its ubiquity, has been largely overlooked. Yet, detecting synthetic tabular data is especially challenging due to its heterogeneous structure and unseen formats at test time. We address the underexplored task of detecting synthetic tabular data in the wild, where tables have variable and previously unseen schemas. We introduce a novel datum-wise transformer architecture that significantly outperforms the only previously published baseline, improving both AUC and accuracy by 7 points. By incorporating a table-adaptation component, our model gains an additional 7 accuracy points, demonstrating enhanced robustness. This work provides the first strong evidence that detecting synthetic tabular data in real-world conditions is not only feasible, but can be done with high reliability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes