LG AI CYApr 10, 2025

Critical Challenges and Guidelines in Evaluating Synthetic Tabular Data: A Systematic Review

Nazia Nafis, Inaki Esnaola, Alvaro Martinez-Perez, Maria-Cruz Villa-Uriol, Venet Osmani

arXiv:2504.18544v24.12 citationsh-index: 28

Originality Synthesis-oriented

AI Analysis

It addresses the critical need for reliable evaluation methods in synthetic data generation, especially for health applications, but is incremental as it synthesizes existing research rather than introducing new techniques.

This systematic review tackled the problem of evaluating synthetic tabular data, particularly in health contexts, by identifying key challenges such as lack of consensus on methods and limited reproducibility, and provided guidelines to improve evaluation practices.

Generating synthetic tabular data can be challenging, however evaluation of their quality is just as challenging, if not more. This systematic review sheds light on the critical importance of rigorous evaluation of synthetic health data to ensure reliability, relevance, and their appropriate use. Based on screening of 1766 papers and a detailed review of 101 papers we identified key challenges, including lack of consensus on evaluation methods, improper use of evaluation metrics, limited input from domain experts, inadequate reporting of dataset characteristics, and limited reproducibility of results. In response, we provide several guidelines on the generation and evaluation of synthetic data, to allow the community to unlock and fully harness the transformative potential of synthetic data and accelerate innovation.

View on arXiv PDF

Similar