LGAIMLFeb 4, 2022

Generative Modeling of Complex Data

arXiv:2202.02145v16 citations
Originality Highly original
AI Analysis

This addresses the need for synthetic data generation in real-world applications with complex data structures, representing a novel method rather than an incremental improvement.

The paper tackles the problem of generating synthetic data for complex structures with composite and nested types, which existing models cannot handle. It proposes a framework with a causal transformer implementation that outperforms state-of-the-art models on benchmarks and achieves strong results on previously inaccessible hierarchical datasets.

In recent years, several models have improved the capacity to generate synthetic tabular datasets. However, such models focus on synthesizing simple columnar tables and are not useable on real-life data with complex structures. This paper puts forward a generic framework to synthesize more complex data structures with composite and nested types. It then proposes one practical implementation, built with causal transformers, for struct (mappings of types) and lists (repeated instances of a type). The results on standard benchmark datasets show that such implementation consistently outperforms current state-of-the-art models both in terms of machine learning utility and statistical similarity. Moreover, it shows very strong results on two complex hierarchical datasets with multiple nesting and sparse data, that were previously out of reach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes