Preserving logical and functional dependencies in synthetic tabular data
This addresses a gap in synthetic data generation for tabular data, which is incremental as it builds on existing methods by introducing new dependency types and evaluation metrics.
The paper tackled the problem of whether synthetic tabular data generation algorithms preserve dependencies among attributes, introducing logical dependencies and a measure to quantify them, and found that current algorithms do not fully preserve functional dependencies but some can preserve logical dependencies.
Dependencies among attributes are a common aspect of tabular data. However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored. In addition to the existing notion of functional dependencies, we introduce the notion of logical dependencies among the attributes in this article. Moreover, we provide a measure to quantify logical dependencies among attributes in tabular data. Utilizing this measure, we compare several state-of-the-art synthetic data generation algorithms and test their capability to preserve logical and functional dependencies on several publicly available datasets. We demonstrate that currently available synthetic tabular data generation algorithms do not fully preserve functional dependencies when they generate synthetic datasets. In addition, we also showed that some tabular synthetic data generation models can preserve inter-attribute logical dependencies. Our review and comparison of the state-of-the-art reveal research needs and opportunities to develop task-specific synthetic tabular data generation models.