LGJan 21, 2025

TabularARGN: A Flexible and Efficient Auto-Regressive Framework for Generating High-Fidelity Synthetic Data

arXiv:2501.12012v214 citationsh-index: 5
AI Analysis

This addresses the need for efficient and high-fidelity synthetic data generation in real-world applications across industries, representing a novel method rather than an incremental improvement.

The paper tackles the problem of synthetic data generation for tabular datasets by introducing TabularARGN, a flexible auto-regressive framework that achieves state-of-the-art quality and significantly reduces training and inference times for handling mixed-type, multivariate, and sequential data.

Synthetic data generation for tabular datasets must balance fidelity, efficiency, and versatility to meet the demands of real-world applications. We introduce the Tabular Auto-Regressive Generative Network (TabularARGN), a flexible framework designed to handle mixed-type, multivariate, and sequential datasets. By training on all possible conditional probabilities, TabularARGN supports advanced features such as fairness-aware generation, imputation, and conditional generation on any subset of columns. The framework achieves state-of-the-art synthetic data quality while significantly reducing training and inference times, making it ideal for large-scale datasets with diverse structures. Evaluated across established benchmarks, including realistic datasets with complex relationships, TabularARGN demonstrates its capability to synthesize high-quality data efficiently. By unifying flexibility and performance, this framework paves the way for practical synthetic data generation across industries.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes