LGAIAug 15, 2025

FairTabGen: Unifying Counterfactual and Causal Fairness in Synthetic Tabular Data Generation

arXiv:2508.11810v11 citationsh-index: 20
Originality Incremental advance
AI Analysis

It addresses fairness issues in synthetic data generation for privacy-sensitive, data-scarce settings, representing an incremental advance by integrating multiple fairness definitions into an LLM-based framework.

The paper tackled the problem of improving counterfactual and causal fairness in synthetic tabular data generation while preserving utility, resulting in up to 10% improvements on fairness metrics and efficiency with less than 20% of original data.

Generating synthetic data is crucial in privacy-sensitive, data-scarce settings, especially for tabular datasets widely used in real-world applications. A key challenge is improving counterfactual and causal fairness, while preserving high utility. We present FairTabGen, a fairness-aware large language model-based framework for tabular synthetic data generation. We integrate multiple fairness definitions including counterfactual and causal fairness into both its generation and evaluation pipelines. We use in-context learning, prompt refinement, and fairness-aware data curation to balance fairness and utility. Across diverse datasets, our method outperforms state-of-the-art GAN-based and LLM-based methods, achieving up to 10% improvements on fairness metrics such as demographic parity and path-specific causal effects while retaining statistical utility. Remarkably, it achieves these gains using less than 20% of the original data, highlighting its efficiency in low-data regimes. These results demonstrate a principled and practical approach for generating fair and useful synthetic tabular data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes