LG AISep 26, 2024

Preserving logical and functional dependencies in synthetic tabular data

Chaithra Umesh, Kristian Schultz, Manjunath Mahendra, Saparshi Bej, Olaf Wolkenhauer

arXiv:2409.17684v16.41 citationsh-index: 7Has Code

Originality Incremental advance

AI Analysis

This addresses a gap in synthetic data generation for tabular data, which is incremental as it builds on existing methods by introducing new dependency types and evaluation metrics.

The paper tackled the problem of whether synthetic tabular data generation algorithms preserve dependencies among attributes, introducing logical dependencies and a measure to quantify them, and found that current algorithms do not fully preserve functional dependencies but some can preserve logical dependencies.

Dependencies among attributes are a common aspect of tabular data. However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored. In addition to the existing notion of functional dependencies, we introduce the notion of logical dependencies among the attributes in this article. Moreover, we provide a measure to quantify logical dependencies among attributes in tabular data. Utilizing this measure, we compare several state-of-the-art synthetic data generation algorithms and test their capability to preserve logical and functional dependencies on several publicly available datasets. We demonstrate that currently available synthetic tabular data generation algorithms do not fully preserve functional dependencies when they generate synthetic datasets. In addition, we also showed that some tabular synthetic data generation models can preserve inter-attribute logical dependencies. Our review and comparison of the state-of-the-art reveal research needs and opportunities to develop task-specific synthetic tabular data generation models.

View on arXiv PDF Code

Similar