Datum-wise Transformer for Synthetic Tabular Data Detection in the Wild
This addresses the detection of synthetic tabular data, a critical issue for industries and governments, but is incremental as it builds on existing detection methods for other media types.
The paper tackles the problem of detecting synthetic tabular data in real-world scenarios where table structures vary widely, introducing a novel datum-wise transformer architecture that outperforms existing models and incorporates domain adaptation for robustness.
The growing power of generative models raises major concerns about the authenticity of published content. To address this problem, several synthetic content detection methods have been proposed for uniformly structured media such as image or text. However, little work has been done on the detection of synthetic tabular data, despite its importance in industry and government. This form of data is complex to handle due to the diversity of its structures: the number and types of the columns may vary wildly from one table to another. We tackle the tough problem of detecting synthetic tabular data ''in the wild'', i.e. when the model is deployed on table structures it has never seen before. We introduce a novel datum-wise transformer architecture and show that it outperforms existing models. Furthermore, we investigate the application of domain adaptation techniques to enhance the effectiveness of our model, thereby providing a more robust data-forgery detection solution.