LGAIMar 26, 2025

Assessing Generative Models for Structured Data

arXiv:2503.20903v12 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the need for better evaluation in synthetic data generation for applications like data augmentation and privacy, though it is incremental as it focuses on assessment rather than generation.

The paper tackled the problem of evaluating synthetic tabular data quality by introducing rigorous methods to assess inter-column dependencies, finding that large language models (GPT-2) and GANs (CTGAN) fail to replicate the original data's dependencies.

Synthetic tabular data generation has emerged as a promising method to address limited data availability and privacy concerns. With the sharp increase in the performance of large language models in recent years, researchers have been interested in applying these models to the generation of tabular data. However, little is known about the quality of the generated tabular data from large language models. The predominant method for assessing the quality of synthetic tabular data is the train-synthetic-test-real approach, where the artificial examples are compared to the original by how well machine learning models, trained separately on the real and synthetic sets, perform in some downstream tasks. This method does not directly measure how closely the distribution of generated data approximates that of the original. This paper introduces rigorous methods for directly assessing synthetic tabular data against real data by looking at inter-column dependencies within the data. We find that large language models (GPT-2), both when queried via few-shot prompting and when fine-tuned, and GAN (CTGAN) models do not produce data with dependencies that mirror the original real data. Results from this study can inform future practice in synthetic data generation to improve data quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes