LGFeb 14, 2018

Geometry-Based Data Generation

arXiv:1802.04927v436 citations
Originality Incremental advance
AI Analysis

This addresses data quality issues in generative modeling for applications like correcting missing data and finding relationships, but it is incremental as it builds on diffusion processes and manifold learning.

The paper tackles the problem of generative models replicating biased data density by proposing SUGAR, a method that learns manifold geometry and generates points evenly along it, correcting sampling biases and artifacts while revealing intrinsic patterns.

Many generative models attempt to replicate the density of their input data. However, this approach is often undesirable, since data density is highly affected by sampling biases, noise, and artifacts. We propose a method called SUGAR (Synthesis Using Geometrically Aligned Random-walks) that uses a diffusion process to learn a manifold geometry from the data. Then, it generates new points evenly along the manifold by pulling randomly generated points into its intrinsic structure using a diffusion kernel. SUGAR equalizes the density along the manifold by selectively generating points in sparse areas of the manifold. We demonstrate how the approach corrects sampling biases and artifacts, while also revealing intrinsic patterns (e.g. progression) and relations in the data. The method is applicable for correcting missing data, finding hypothetical data points, and learning relationships between data features.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes