LGApr 12, 2024

Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models

Zeyu Yang, Han Yu, Peikun Guo, Khadija Zanna, Xiaoxue Yang, Akane Sano

arXiv:2404.08254v313 citationsh-index: 5Has CodeTrans. Mach. Learn. Res.

Originality Incremental advance

AI Analysis

This addresses fairness issues in tabular data synthesis for applications like machine learning where biased training data can lead to discriminatory actions, representing an incremental advance in fairness-aware generative models.

The paper tackles bias in tabular data synthesis by introducing a diffusion model with sensitive guidance to generate fair synthetic data with balanced joint distributions of target labels and sensitive attributes like sex and race, achieving over 10% improvements in fairness metrics such as demographic parity ratio and equalized odds ratio.

Diffusion models have emerged as a robust framework for various generative tasks, including tabular data synthesis. However, current tabular diffusion models tend to inherit bias in the training dataset and generate biased synthetic data, which may influence discriminatory actions. In this research, we introduce a novel tabular diffusion model that incorporates sensitive guidance to generate fair synthetic data with balanced joint distributions of the target label and sensitive attributes, such as sex and race. The empirical results demonstrate that our method effectively mitigates bias in training data while maintaining the quality of the generated samples. Furthermore, we provide evidence that our approach outperforms existing methods for synthesizing tabular data on fairness metrics such as demographic parity ratio and equalized odds ratio, achieving improvements of over $10\%$. Our implementation is available at https://github.com/comp-well-org/fair-tab-diffusion.

View on arXiv PDF Code

Similar