LGApr 2

SEDGE: Structural Extrapolated Data Generation

Kun Zhang, Jiaqi Sun, Yiqing Li, Ignavier Ng, Namrata Deka, Shaoan Xie

CMU

arXiv:2604.0248270.3h-index: 17

Predicted impact top 25% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the need for controlled data generation in machine learning applications, though it appears incremental as it builds on existing optimization and sampling methods.

The paper tackles the problem of generating data that satisfies new specifications by proposing the SEDGE framework, which provides conditions for reliable generation and approximate identifiability of distributions under conservative assumptions, with verification on synthetic data and image generation.

This paper proposes a framework for Structural Extrapolated Data GEneration (SEDGE) based on suitable assumptions on the underlying data generating process. We provide conditions under which data satisfying new specifications can be generated reliably, together with the approximate identifiability of the distribution of such data under certain ``conservative" assumptions. On the algorithmic side, we develop practical methods to achieve extrapolated data generation, based on the structure-informed optimization strategy or diffusion posterior sampling, respectively. We verify the extrapolation performance on synthetic data and also consider extrapolated image generation as a real-world scenario to illustrate the validity of the proposed framework.

View on arXiv PDF

Similar