CVLGNov 29, 2022

Procedural Image Programs for Representation Learning

MIT
arXiv:2211.16412v238 citationsh-index: 140
Originality Incremental advance
AI Analysis

This addresses the problem of limited scalability and expert dependency in synthetic data generation for machine learning researchers, though it is incremental in improving existing methods.

The paper tackles the challenge of scaling up synthetic image generation for representation learning by proposing a dataset of 21,000 procedural programs, which reduces the performance gap between pre-training with real and procedurally generated images by 38%.

Learning image representations using synthetic data allows training neural networks without some of the concerns associated with real images, such as privacy and bias. Existing work focuses on a handful of curated generative processes which require expert knowledge to design, making it hard to scale up. To overcome this, we propose training with a large dataset of twenty-one thousand programs, each one generating a diverse set of synthetic images. These programs are short code snippets, which are easy to modify and fast to execute using OpenGL. The proposed dataset can be used for both supervised and unsupervised representation learning, and reduces the gap between pre-training with real and procedurally generated images by 38%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes